ceph - sepia - 2024-09-03

Timestamp (UTC)Message
2024-09-03T10:38:37.960Z
<Jose J Palacios-Perez> Hi All, a newbie question on the sepia lab infra, details in thread, thanks in advance 👍
2024-09-03T10:39:22.179Z
<Jose J Palacios-Perez> I've so far failed miserably trying to use the nvme drives in the o05 machine. To avoid or reduce disruption I tried to deploy within a container. Long story short: I first tried with a custom container that checkouts the git repos from ceph and dependencies and builds them from source. But vstart was not able to use any of the nvme devices, for example:
--
```@9dfb097a4360:/ceph/build
[17:10:49]$ # MDS=0 MON=1 OSD=1 MGR=1 ../src/vstart.sh --new -x --localhost --without-dashboard --bluestore --redirect-output --bluestore-devs /dev/nvme9n1 --crimson --no-restart
All --bluestore-devs must refer to writable block devices```
--
(this is despite of using --privileged=true when running the container).

I've tried lots of different approaches to summarise, latest I looked was at <https://github.com/ceph/ceph-container/> and managed to build both daemon-base and daemon containers, however when trying to create a cluster, I get the following error:
--
```# echo $IMG
localhost/ceph/daemon:main-main-centos-stream9-x86_64
@o05:~
[08:50:32]$ # sudo mkdir /sys/fs/cgroup/memory/conmon
# podman run -d --net=host --log-level debug --privileged=true -v /etc/ceph:/etc/ceph -v /var/lib/ceph/:/var/lib/ceph/ -e MON_IP=172.21.64.5 -e CEPH_PUBLIC_NETWORK=172.21.64.5 $IMG ceph/mon
:
INFO[0000] Failed to add conmon to cgroupfs sandbox cgroup: creating cgroup for pids: mkdir /sys/fs/cgroup/pids/conmon: permission denied
[conmon:d]: failed to write to /proc/self/oom_score_adj: Permission denied
:
DEBU[0000] Shutting down engines

# podman ps
CONTAINER ID IMAGE   COMMAND  CREATED  STATUS   PORTS   NAMES
@o05:~```
--
(I have not tried to escalate privileges like su - )

So my question is how do you use the nvme drives with a vstart cluster? (even if its not within a container)

(I was also looking at <https://wiki.sepia.ceph.com/doku.php?id=devplayground> for clues but was wondering whether I would mess things up).

Many thanks in advance!
2024-09-03T12:33:53.501Z
<Jose J Palacios-Perez> some repos seem broken though: https://files.slack.com/files-pri/T1HG3J90S-F07KP75GWUB/download/screenshot_2024-09-03_at_13.33.04.png
2024-09-03T12:37:08.781Z
<Jose J Palacios-Perez> ```[12:35:37]$ # dnf search cephadm --repo apt-mirror.front.sepia.ceph.com_lab-extras_8_
Last metadata expiration check: 5 days, 22:28:54 ago on Wed 28 Aug 2024 14:07:29 UTC.
No matches found.
@o05:~
[12:36:23]$ # dnf search ceph --repo apt-mirror.front.sepia.ceph.com_lab-extras_8_
======================================================================================================================== Name & Summary Matched: ceph ========================================================================================================================
centos-release-ceph-reef.noarch : Ceph Reef packages from the CentOS Storage SIG repository```: https://files.slack.com/files-pri/T1HG3J90S-F07KPAJDAM8/download/screenshot_2024-09-03_at_13.34.55.png
2024-09-03T12:37:19.320Z
<Jose J Palacios-Perez> I might try from source instead 🤞
2024-09-03T12:43:53.813Z
<Guillaume Abrioux> @Adam Kraitman <https://jenkins.ceph.com/job/ceph-pull-requests/142608/consoleFull#9417136787ba86caa-bd6c-4071-8005-3f6d80f92e07> I still see this issue, is it still something under investigation ?
2024-09-03T12:50:55.369Z
<Jose J Palacios-Perez> Nope, :cry:  probably need an upgrade from centos8 to 9, since I managed to build within the container: https://files.slack.com/files-pri/T1HG3J90S-F07KP9P93KM/download/screenshot_2024-09-03_at_13.49.43.png
2024-09-03T14:08:56.627Z
<Matan Breizman> Hey @Adam Kraitman, can we please upgrade this machine to c9?
Thanks!
2024-09-03T18:10:38.689Z
<Adam Kraitman> Hey @Matan Breizman I can upgrade it but I prefer to do a fresh installation, it will be quicker, can you open a tracker ticket for this task ?
2024-09-03T18:25:55.658Z
<Dan Mick> well, an issue was that a lot of protobuf files were apparently installed "by hand" (not part of any package).  I don't have any idea what would have done that, or a way that I'm really comfortable with finding and deleting them (the best I'd come up with was a date range of m/ctime then fed into a "which package owns this file".  I hope to have some time today to look at an automatic way to identify nonpackaged files that's reliable and doesn't get false-positives.
2024-09-03T18:26:18.554Z
<Dan Mick> well, an issue was that a lot of protobuf files were apparently installed "by hand" (not part of any package).  I don't have any idea what would have done that, or a way that I'm really comfortable with finding and deleting them (the best I'd come up with was a date range of m/ctime then fed into a "which package owns this file").  I hope to have some time today to look at an automatic way to identify nonpackaged files that's reliable and doesn't get false-positives.
2024-09-03T19:55:38.642Z
<Patrick Donnelly> @Dan Mick @Adam Kraitman can one of you please kick vossi04 which appears to have died?
2024-09-03T22:05:50.691Z
<Dan Mick> [7782197.176059] libceph: mon0 (2)127.0.0.1:40676 socket error on write
[7782197.432055] libceph: mon0 (2)127.0.0.1:40676 socket error on write
vossi04 [7782197.936058] libceph: mon0 (2)127.0.0.1:40676 socket error on write
login:
[7782199.095060] libceph: mon0 (2)127.0.0.1:40676 socket error on write
vossi04 login:
vossi04 login:
2024-09-03T22:06:14.534Z
<Dan Mick> [7782197.176059] libceph: mon0 (2)127.0.0.1:40676 socket error on write
[7782197.432055] libceph: mon0 (2)127.0.0.1:40676 socket error on write
vossi04 [7782197.936058] libceph: mon0 (2)127.0.0.1:40676 socket error on write
login:
[7782199.095060] libceph: mon0 (2)127.0.0.1:40676 socket error on write
vossi04 login:
vossi04 login:
2024-09-03T22:06:23.802Z
<Dan Mick> mon2, mds0, as well
2024-09-03T22:07:07.174Z
<Dan Mick> doesn't appear to be responding to login efforts
2024-09-03T22:10:27.586Z
<Dan Mick> I guess I'll powercycle

Any issue? please create an issue here and use the infra label.