ceph - sepia - 2024-10-25

Timestamp (UTC)Message
2024-10-25T06:06:10.488Z
<Anoop C S> `ceph-mgr-dashboard` requires `python3-grpcio-tools`  which in turn depends on _libprotobuf.so.25_ and _libprotoc.so.25_ provided by `protobuf`  and `protobuf-compiler` respectively. But `protobuf-compiler` comes via CRB which is not enabled by default in ceph container images. Thus we end up with the update issue mentioned earlier due to newer versions of both `protobuf` and `protobuf-compiler` available with standard Appstream and CRB repositories. I hope its clear now.
2024-10-25T07:05:20.808Z
<Guillaume Abrioux> not sure i'm following @Anoop C S
2024-10-25T07:05:31.599Z
<Guillaume Abrioux> are you rebuilding an image from the one we build?
2024-10-25T07:59:23.614Z
<Anoop C S> We are using a ceph container image as a base to build another CI image for go-ceph. As part of the build process we encountered the issue while running `dnf update` with squid.
2024-10-25T08:03:12.852Z
<Guillaume Abrioux> ok make sense then
2024-10-25T08:03:58.871Z
<Guillaume Abrioux> which image are you using exactly ?
2024-10-25T08:04:04.866Z
<Guillaume Abrioux> [quay.io/ceph/ceph:v19](http://quay.io/ceph/ceph:v19) ?
2024-10-25T08:04:17.253Z
<Anoop C S> Yes.
2024-10-25T08:06:45.859Z
<Dan Mick> oh.  so 1) why is protobuf in both repos, and 2) should the Ceph container be using appstream instead of crb, maybe, and 3) how do we do it to not break the existing installation of protobuf for python3-grpcio-tools, etc.
2024-10-25T08:07:34.170Z
<Dan Mick> the container build uses dnf --enablerepo=crb
2024-10-25T08:07:45.037Z
<Guillaume Abrioux> yeh afaik we enable crb
2024-10-25T08:07:47.287Z
<Anoop C S> 1. `protobuf` comes frmo Appstream and `protobuf-compiler` from CRB.
2024-10-25T08:07:52.691Z
<Dan Mick> can the go-ceph container also use --enablerepo?
2024-10-25T08:07:58.617Z
<Anoop C S> 1. `protobuf` comes from Appstream and `protobuf-compiler` from CRB.
2024-10-25T08:08:33.913Z
<Dan Mick> are appstream-protobuf and crb-protobuf-compiler compatible versions?
2024-10-25T08:08:54.207Z
<Anoop C S> AFAICT, yes.
2024-10-25T08:09:50.949Z
<Anoop C S> `protobuf` is the base package and `protobuf-compiler`  is one of its sub-package.
2024-10-25T08:11:31.314Z
<Dan Mick> protobuf-3.14.0-14.el9.src.rpm for both, olooks like, currently, in stream9
2024-10-25T08:11:32.355Z
<Anoop C S> > can the go-ceph container also use --enablerepo?
That's a workaround and it is in place right now for us. But I think this should be taken care of from ceph container image build process.
2024-10-25T08:12:48.350Z
<Dan Mick> well.  the ceph container isn't meant to support dnf upgrade
2024-10-25T08:13:15.210Z
<Dan Mick> I don't know that it would be harmful if it did, but I wonder why the choice was made to leave crb disabled; probably because that's the default state
2024-10-25T08:13:50.424Z
<Anoop C S> Yes, CRB is not enabled by default for c9s.
2024-10-25T08:14:15.330Z
<Dan Mick> I know, the 'probably' was about 'why that choice was made to leave it that way in the container'
2024-10-25T08:16:06.376Z
<Dan Mick> I think there's a good chance that dnf upgrade would cause a lot more problems than it solves, now that I think about it.  What **ought** to happen is that periodic container updates happen
2024-10-25T08:17:08.799Z
<Dan Mick> and if the base contents of stream9's repos change, that's how they'd move.   I know the containers are periodically rebuilt, but I don't know what the trigger is, or if it ever involves changing the base OS packages
2024-10-25T08:17:27.059Z
<Dan Mick> and if it does I'm not sure it's a good idea πŸ™‚ because it might invalidate testing
2024-10-25T08:18:13.174Z
<Dan Mick> actually I shouldn't say I know they're periodically rebuilt.  I think I believed that once but I was wrong
2024-10-25T08:18:24.604Z
<Anoop C S> FWIW, this has become an issue just because these packages exist in different repositories. Otherwise things would have gone smooth.
2024-10-25T08:19:00.200Z
<Dan Mick> sure.  I would question that package repo design myself
2024-10-25T08:19:29.782Z
<Anoop C S> Since this is an official release version I assume this was built during 19.2.0 GA.
2024-10-25T08:19:51.806Z
<Anoop C S> ~a month ago..
2024-10-25T08:21:01.594Z
<Dan Mick> yeah.  and I now think that's frozen forever (or intended to be).  which of course raises the question "how are security updates to ceph containers handled".  I
2024-10-25T08:22:09.398Z
<Dan Mick> one hopes that if the distro updates packages, they don't break ceph packages (or indeed any installed app)
2024-10-25T08:22:22.620Z
<Dan Mick> but Stream may not make that promise, I'm honestly not sure
2024-10-25T08:23:47.137Z
<Dan Mick> I verify that v19 has 3.14.0-13 packages while the current stream9 repos have -14
2024-10-25T08:24:29.521Z
<Anoop C S> Correct, but we have to deal with stream flow 😜 .
2024-10-25T08:24:37.592Z
<Anoop C S> What would be the next step here?
2024-10-25T08:24:43.272Z
<Dan Mick> well.  someone does, somehow.
2024-10-25T08:25:14.842Z
<Dan Mick> I don't know, it's an interesting question.  I'd like to get more opinions on what best practice is for distributed containers, I guess.
2024-10-25T08:25:30.784Z
<Dan Mick> and of course it's outside US work hours by some margin right now.
2024-10-25T08:25:52.936Z
<Dan Mick> let me discuss this with a few tomorrow.  If I don't get back to this thread, ping me.
2024-10-25T08:26:23.969Z
<Anoop C S> Alright, thanks for your inputs at this very late stage.
2024-10-25T08:27:15.563Z
<Dan Mick> My feeling is that updating inside a container is not desirable.  I think it would make support impossible, for examples.
2024-10-25T08:27:24.772Z
<Dan Mick> My feeling is that updating inside a container is not desirable.  I think it would make support impossible, for example.
2024-10-25T08:28:22.488Z
<Anoop C S> Hm..I tend to agree.
2024-10-25T08:28:38.977Z
<Dan Mick> and it would be better to rebuild the container from first principles.  Right now, for Ceph release containers, that's a little tricky, but soon I hope to change that; CI containers are now being built with a much simpler process that should be relatively easy to reproduce.  What/how are these ceph-go containers used/suppported?
2024-10-25T08:28:56.384Z
<Dan Mick> and it would be better to rebuild the container from first principles.  Right now, for Ceph release containers, that's a little tricky, but soon I hope to change that; CI containers are now being built with a much simpler process that should be relatively easy to reproduce.  What/how are these ceph-go containers used/supported?
2024-10-25T08:29:56.508Z
<Anoop C S> We don't officially release any go-ceph containers. They are only built within GitHub CI for validation purposes.
2024-10-25T08:30:31.193Z
<Dan Mick> maybe dnf update just shouldn't be part of that?
2024-10-25T08:30:57.960Z
<Anoop C S> Now that we talk I am gonna check on that front.
2024-10-25T08:31:09.399Z
<Anoop C S> Why was it needed in first place.
2024-10-25T08:31:24.328Z
<Dan Mick> (and of course ceph CI-built containers validate the OS packages as they go, hopefully so we can catch any breakage before release)
2024-10-25T08:32:15.402Z
<Dan Mick> is your CI build code public?
2024-10-25T08:32:56.895Z
<Dan Mick> go-ceph.git I assume
2024-10-25T08:33:48.214Z
<Anoop C S> We also consume continuously built ceph images from [quay.ceph.io](http://quay.ceph.io) where this wouldn't be a problem at all.
2024-10-25T08:34:20.414Z
<Dan Mick> where's the actual container test code
2024-10-25T08:34:29.483Z
<Dan Mick> (or container build code I should say)
2024-10-25T08:34:39.325Z
<Anoop C S> Latest updates are picked up during build process. Nevertheless this is the first time we're hitting such an update issue all this while.
2024-10-25T08:35:05.579Z
<Dan Mick> protobuf* is a relatively-new dependency
2024-10-25T08:35:24.600Z
<Anoop C S> > where's the actual container test code
<https://github.com/ceph/go-ceph/blob/master/testing/containers/ceph/Dockerfile>
2024-10-25T08:35:51.482Z
<Dan Mick> and I'd bet there aren't that many synchronized tools that have binary packages in different repos
2024-10-25T08:36:54.058Z
<Anoop C S> > protobuf* is a relatively-new dependency
and we were lucky enough to have an updated version πŸ˜†
2024-10-25T08:36:55.149Z
<Dan Mick> yeah, I think for released containers you're better off not updating; it would be a more valid test
2024-10-25T08:38:22.868Z
<Dan Mick> oh, interesting, John Mulligan is involved here
2024-10-25T08:39:18.484Z
<Dan Mick> I will upbraid^H^H^H^H^H^H discuss this with John πŸ˜„
2024-10-25T08:40:13.078Z
<Anoop C S> Yeah, I work with him as maintainers for go-ceph. Make sure that you blame John left and right 🀭 .
2024-10-25T08:41:50.182Z
<Dan Mick> I am of course overstating.  I've never thought hard about this before.
2024-10-25T08:43:26.459Z
<Anoop C S> There you go: <https://github.com/ceph/go-ceph/commit/78ff00f50b42e5bebfa75bb2b67f90f7f5651f64>
2024-10-25T08:43:38.490Z
<Dan Mick> hahaha I was just posting that
2024-10-25T08:44:39.612Z
<Dan Mick> yeah, I have to sleep.  I'll bring it up with John tomorrow.  perhaps here in this thread.
2024-10-25T08:45:48.102Z
<Chris Harris> I had all my perf test runs fail last night with:
```HTTPConnectionPool(host='[mira118.front.sepia.ceph.com](http://mira118.front.sepia.ceph.com)', port=4000): Max retries exceeded with url: /cbt_performance (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f11a05209d0>: Failed to establish a new connection: [Errno 113] No route to host')) ```
Tried pinging the machine from `[teuthology.front.sepia.ceph.com](http://teuthology.front.sepia.ceph.com)`  but it seems it is unreachable:
```PING [mira118.front.sepia.ceph.com](http://mira118.front.sepia.ceph.com) (172.21.9.120) 56(84) bytes of data.
From [teuthology.front.sepia.ceph.com](http://teuthology.front.sepia.ceph.com) (172.21.0.51) icmp_seq=1 Destination Host Unreachable
From [teuthology.front.sepia.ceph.com](http://teuthology.front.sepia.ceph.com) (172.21.0.51) icmp_seq=2 Destination Host Unreachable
From [teuthology.front.sepia.ceph.com](http://teuthology.front.sepia.ceph.com) (172.21.0.51) icmp_seq=3 Destination Host Unreachable
^C
--- [mira118.front.sepia.ceph.com](http://mira118.front.sepia.ceph.com) ping statistics ---
4 packets transmitted, 0 received, +3 errors, 100% packet loss, time 3067ms```
Is this a known problem with `[mira118.front.sepia.ceph.com](http://mira118.front.sepia.ceph.com)` ?
2024-10-25T09:15:59.530Z
<Anoop C S> Well..well..well. I was there beforeπŸ˜„.
<https://github.com/ceph/go-ceph/pull/910#issuecomment-1670798795>.

I think version specific installation of _*-devel_ packages was the trigger for introducing `dnf update`. I'll continue discussing it further with John in our dedicated go-ceph channel.
2024-10-25T18:19:18.238Z
<Dan Mick> No.  It could just be down
2024-10-25T19:26:16.847Z
<Zack Cerza> ```❯ rg mira118
qa/tasks/cbt_performance.py
11:server  = "<http://mira118.front.sepia.ceph.com>"```
@Chris Harris, I think you'd have to ask the author of the module πŸ˜•

Any issue? please create an issue here and use the infra label.