ceph - sepia - 2024-10-23

Timestamp (UTC)	Message
2024-10-23T05:50:40.977Z	<Anoop C S> We have newer versions of _protobuf_(and _protobuf-compiler_) available with CentOS Stream 9 which causes the following issue while performing `dnf update` with squid container image: ```Error: Problem: protobuf-3.14.0-13.el9.i686 from appstream does not belong to a distupgrade repository - package protobuf-compiler-3.14.0-13.el9.x86_64 from @System requires protobuf = 3.14.0-13.el9, but none of the providers can be installed - cannot install both protobuf-3.14.0-14.el9.x86_64 from appstream and protobuf-3.14.0-13.el9.x86_64 from @System - cannot install both protobuf-3.14.0-14.el9.x86_64 from appstream and protobuf-3.14.0-13.el9.x86_64 from appstream - cannot install the best update candidate for package protobuf-3.14.0-13.el9.x86_64 - problem with installed package protobuf-compiler-3.14.0-13.el9.x86_64 (try to add '--allowerasing' to command line to replace conflicting packages or '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)``` Can we have a squid rebuild to consume the latest packages from standard repositories?
2024-10-23T05:54:26.020Z	<Anoop C S> We have newer versions of _protobuf_(and _protobuf-compiler_) available with CentOS Stream 9 which causes the following issue while performing `dnf update` with the squid container image: ```Error: Problem: protobuf-3.14.0-13.el9.i686 from appstream does not belong to a distupgrade repository - package protobuf-compiler-3.14.0-13.el9.x86_64 from @System requires protobuf = 3.14.0-13.el9, but none of the providers can be installed - cannot install both protobuf-3.14.0-14.el9.x86_64 from appstream and protobuf-3.14.0-13.el9.x86_64 from @System - cannot install both protobuf-3.14.0-14.el9.x86_64 from appstream and protobuf-3.14.0-13.el9.x86_64 from appstream - cannot install the best update candidate for package protobuf-3.14.0-13.el9.x86_64 - problem with installed package protobuf-compiler-3.14.0-13.el9.x86_64 (try to add '--allowerasing' to command line to replace conflicting packages or '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)``` Can we have a squid rebuild to consume the latest packages from standard repositories?
2024-10-23T09:49:59.357Z	<Vallari Agrawal> Unable to ssh to teuthology machine, something seems wrong: https://files.slack.com/files-pri/T1HG3J90S-F07SQ6TPNMV/download/image.png
2024-10-23T12:23:32.846Z	<Ronen Friedman> same here. My active connection dropped.
2024-10-23T15:23:01.619Z	<yuriw> +1
2024-10-23T15:35:58.780Z	<Zack Cerza> lots of errors in the rhev event log over the last several days; curious if @Adam Kraitman or @Dan Mick have been aware of those issues
2024-10-23T15:46:22.267Z	<Zack Cerza> iscsi again. from hv03, which is flapping in rhev: `Oct 23 14:45:10 [hv03.front.sepia.ceph.com](http://hv03.front.sepia.ceph.com) iscsid[2183]: iscsid: Connection1855:0 to [target: iqn.2003-01.com.redhat.iscsi-gw:lrc-iscsi1, portal: 172.21.2.202,3260] through [iface: default] is shutdown.` IP above is reesi002. from that host: `reesi002 kernel: iSCSI Initiator Node: iqn.1994-05.com.redhat:8a2a399b7e6f is not authorized to access iSCSI target portal group: 4.`
2024-10-23T15:58:55.198Z	<Zack Cerza> hm. went to install iotop on reesi002 and it's taking _several_ minutes to install a 23kb package. 😬
2024-10-23T16:47:12.828Z	<Christina Meno> Dan has been notified and is inbound
2024-10-23T17:03:11.263Z	<Dan Mick> iscsi service is not running on reesi002
2024-10-23T17:05:21.014Z	<Dan Mick> restarted. flapping with errors about no such device
2024-10-23T17:06:06.351Z	<Dan Mick> something on the cluster
2024-10-23T17:08:13.399Z	<Dan Mick> OSError: [Errno 19] No such device: '/sys/kernel/config/target/core/user_0/lrc.lrc_vol' -> '/sys/kernel/config/target/iscsi/iqn.2003-01.com.redhat.iscsi-gw:lrc-iscsi1/tpgt_4/lun/lun_0/c8e2c1b23e'
2024-10-23T17:08:25.815Z	<Dan Mick> I guess that's the kernel
2024-10-23T17:08:28.788Z	<Dan Mick> I'll reboot
2024-10-23T17:16:27.326Z	<Dan Mick> back up, iscsi storage domain is green on RHEV, all hv hosts up, all vms except teuthology up, status "paused due to lack of storage space", investigating
2024-10-23T17:17:30.184Z	<Dan Mick> kicked it, it seems to have restarted
2024-10-23T17:17:43.636Z	<Dan Mick> ssh works
2024-10-23T17:38:35.984Z	<Zack Cerza> ah wow, since the teuthology vm was paused it was able to come back pretty gracefully
2024-10-23T17:38:53.723Z	<Zack Cerza> thanks for fixing that up Dan. was it more than just the reboot?
2024-10-23T17:43:47.729Z	<Dan Mick> no, not really; the reboot involved acknowledging the BIOS complaint about SMART, and I couldn't find a way into BIOS to disable it as I have on other reesi
2024-10-23T17:43:55.044Z	<Dan Mick> so I had to watch the console to get the reboot to happen
2024-10-23T17:44:52.553Z	<Dan Mick> I guess I could work that out on reesi003, which is falsely decommissioned based on the same issue until I found the BIOS was overcautious
2024-10-23T18:08:56.062Z	<Dan Mick> the reference to tptg_4 makes me think that it might have been connected to the things I've noticed on the hvNN RHEV VMhosts that seem to have a path to not only target 0 but targets of higher numbers. I don't know why that is or how it should be corrected yet
2024-10-23T18:09:31.933Z	<Dan Mick> the reference to tpgt_4 makes me think that it might have been connected to the things I've noticed on the hvNN RHEV VMhosts that seem to have a path to not only target 0 but targets of higher numbers. I don't know why that is or how it should be corrected yet
2024-10-23T18:10:09.563Z	<Dan Mick> there is only one actual target image, shared by three gateways (on reesi002, 4, and 5). I would expect three paths to it, and only three
2024-10-23T19:14:15.044Z	<Dan Mick> @Zack Cerza do you think we should try running that script against our fog instance?
2024-10-23T19:14:29.794Z	<Dan Mick> I did notice that it has an insane amount of history
2024-10-23T19:14:48.142Z	<Dan Mick> (and that probably explains the mysql process RSS too)
2024-10-23T20:37:59.689Z	<Dan Mick> @nehaojha I filed RITM1921261 to start an investigation of senta02. What's on that host?
2024-10-23T20:41:36.323Z	<nehaojha> I used to access teuthology logs using it but I can always use vossi.
2024-10-23T21:10:40.076Z	<Zack Cerza> yeah, I think we do. especially since I can now see that teuthology is in fact requesting the right OS, and fog is installing something else. Would be helpful to be able to see per-host history.
2024-10-23T21:10:49.146Z	<Zack Cerza> yeah, I think we should. especially since I can now see that teuthology is in fact requesting the right OS, and fog is installing something else. Would be helpful to be able to see per-host history.
2024-10-23T22:10:26.851Z	<Dan Mick> ok. I'll see about getting db access
2024-10-23T22:26:40.616Z	<Dan Mick> well, it appears as though multicastSessions is empty
2024-10-23T22:27:52.559Z	<Dan Mick> but imagingLog and tasks have 1.5M records each

ceph - sepia - 2024-10-23

Any issue? please create an issue here and use the infra label.