ceph - sepia - 2024-09-03

Timestamp (UTC)	Message
2024-09-03T10:38:37.960Z	<Jose J Palacios-Perez> Hi All, a newbie question on the sepia lab infra, details in thread, thanks in advance 👍
2024-09-03T10:39:22.179Z	<Jose J Palacios-Perez> I've so far failed miserably trying to use the nvme drives in the o05 machine. To avoid or reduce disruption I tried to deploy within a container. Long story short: I first tried with a custom container that checkouts the git repos from ceph and dependencies and builds them from source. But vstart was not able to use any of the nvme devices, for example: -- ```@9dfb097a4360:/ceph/build [17:10:49]$ # MDS=0 MON=1 OSD=1 MGR=1 ../src/vstart.sh --new -x --localhost --without-dashboard --bluestore --redirect-output --bluestore-devs /dev/nvme9n1 --crimson --no-restart All --bluestore-devs must refer to writable block devices``` -- (this is despite of using --privileged=true when running the container). I've tried lots of different approaches to summarise, latest I looked was at <https://github.com/ceph/ceph-container/> and managed to build both daemon-base and daemon containers, however when trying to create a cluster, I get the following error: -- ```# echo $IMG localhost/ceph/daemon:main-main-centos-stream9-x86_64 @o05:~ [08:50:32]$ # sudo mkdir /sys/fs/cgroup/memory/conmon # podman run -d --net=host --log-level debug --privileged=true -v /etc/ceph:/etc/ceph -v /var/lib/ceph/:/var/lib/ceph/ -e MON_IP=172.21.64.5 -e CEPH_PUBLIC_NETWORK=172.21.64.5 $IMG ceph/mon : INFO[0000] Failed to add conmon to cgroupfs sandbox cgroup: creating cgroup for pids: mkdir /sys/fs/cgroup/pids/conmon: permission denied [conmon:d]: failed to write to /proc/self/oom_score_adj: Permission denied : DEBU[0000] Shutting down engines # podman ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES @o05:~``` -- (I have not tried to escalate privileges like su - ) So my question is how do you use the nvme drives with a vstart cluster? (even if its not within a container) (I was also looking at <https://wiki.sepia.ceph.com/doku.php?id=devplayground> for clues but was wondering whether I would mess things up). Many thanks in advance!
2024-09-03T12:33:53.501Z	<Jose J Palacios-Perez> some repos seem broken though: https://files.slack.com/files-pri/T1HG3J90S-F07KP75GWUB/download/screenshot_2024-09-03_at_13.33.04.png
2024-09-03T12:37:08.781Z	<Jose J Palacios-Perez> ```[12:35:37]$ # dnf search cephadm --repo apt-mirror.front.sepia.ceph.com_lab-extras_8_ Last metadata expiration check: 5 days, 22:28:54 ago on Wed 28 Aug 2024 14:07:29 UTC. No matches found. @o05:~ [12:36:23]$ # dnf search ceph --repo apt-mirror.front.sepia.ceph.com_lab-extras_8_ ======================================================================================================================== Name & Summary Matched: ceph ======================================================================================================================== centos-release-ceph-reef.noarch : Ceph Reef packages from the CentOS Storage SIG repository```: https://files.slack.com/files-pri/T1HG3J90S-F07KPAJDAM8/download/screenshot_2024-09-03_at_13.34.55.png
2024-09-03T12:37:19.320Z	<Jose J Palacios-Perez> I might try from source instead 🤞
2024-09-03T12:43:53.813Z	<Guillaume Abrioux> @Adam Kraitman <https://jenkins.ceph.com/job/ceph-pull-requests/142608/consoleFull#9417136787ba86caa-bd6c-4071-8005-3f6d80f92e07> I still see this issue, is it still something under investigation ?
2024-09-03T12:50:55.369Z	<Jose J Palacios-Perez> Nope, :cry: probably need an upgrade from centos8 to 9, since I managed to build within the container: https://files.slack.com/files-pri/T1HG3J90S-F07KP9P93KM/download/screenshot_2024-09-03_at_13.49.43.png
2024-09-03T14:08:56.627Z	<Matan Breizman> Hey @Adam Kraitman, can we please upgrade this machine to c9? Thanks!
2024-09-03T18:10:38.689Z	<Adam Kraitman> Hey @Matan Breizman I can upgrade it but I prefer to do a fresh installation, it will be quicker, can you open a tracker ticket for this task ?
2024-09-03T18:25:55.658Z	<Dan Mick> well, an issue was that a lot of protobuf files were apparently installed "by hand" (not part of any package). I don't have any idea what would have done that, or a way that I'm really comfortable with finding and deleting them (the best I'd come up with was a date range of m/ctime then fed into a "which package owns this file". I hope to have some time today to look at an automatic way to identify nonpackaged files that's reliable and doesn't get false-positives.
2024-09-03T18:26:18.554Z	<Dan Mick> well, an issue was that a lot of protobuf files were apparently installed "by hand" (not part of any package). I don't have any idea what would have done that, or a way that I'm really comfortable with finding and deleting them (the best I'd come up with was a date range of m/ctime then fed into a "which package owns this file"). I hope to have some time today to look at an automatic way to identify nonpackaged files that's reliable and doesn't get false-positives.
2024-09-03T19:55:38.642Z	<Patrick Donnelly> @Dan Mick @Adam Kraitman can one of you please kick vossi04 which appears to have died?
2024-09-03T22:05:50.691Z	<Dan Mick> [7782197.176059] libceph: mon0 (2)127.0.0.1:40676 socket error on write [7782197.432055] libceph: mon0 (2)127.0.0.1:40676 socket error on write vossi04 [7782197.936058] libceph: mon0 (2)127.0.0.1:40676 socket error on write login: [7782199.095060] libceph: mon0 (2)127.0.0.1:40676 socket error on write vossi04 login: vossi04 login:
2024-09-03T22:06:14.534Z	<Dan Mick> [7782197.176059] libceph: mon0 (2)127.0.0.1:40676 socket error on write [7782197.432055] libceph: mon0 (2)127.0.0.1:40676 socket error on write vossi04 [7782197.936058] libceph: mon0 (2)127.0.0.1:40676 socket error on write login: [7782199.095060] libceph: mon0 (2)127.0.0.1:40676 socket error on write vossi04 login: vossi04 login:
2024-09-03T22:06:23.802Z	<Dan Mick> mon2, mds0, as well
2024-09-03T22:07:07.174Z	<Dan Mick> doesn't appear to be responding to login efforts
2024-09-03T22:10:27.586Z	<Dan Mick> I guess I'll powercycle

ceph - sepia - 2024-09-03

Any issue? please create an issue here and use the infra label.