2024-09-03T10:38:37.960Z | <Jose J Palacios-Perez> Hi All, a newbie question on the sepia lab infra, details in thread, thanks in advance 👍 |
2024-09-03T10:39:22.179Z | <Jose J Palacios-Perez> I've so far failed miserably trying to use the nvme drives in the o05 machine. To avoid or reduce disruption I tried to deploy within a container. Long story short: I first tried with a custom container that checkouts the git repos from ceph and dependencies and builds them from source. But vstart was not able to use any of the nvme devices, for example:
--
```@9dfb097a4360:/ceph/build
[17:10:49]$ # MDS=0 MON=1 OSD=1 MGR=1 ../src/vstart.sh --new -x --localhost --without-dashboard --bluestore --redirect-output --bluestore-devs /dev/nvme9n1 --crimson --no-restart
All --bluestore-devs must refer to writable block devices```
--
(this is despite of using --privileged=true when running the container).
I've tried lots of different approaches to summarise, latest I looked was at <https://github.com/ceph/ceph-container/> and managed to build both daemon-base and daemon containers, however when trying to create a cluster, I get the following error:
--
```# echo $IMG
localhost/ceph/daemon:main-main-centos-stream9-x86_64
@o05:~
[08:50:32]$ # sudo mkdir /sys/fs/cgroup/memory/conmon
# podman run -d --net=host --log-level debug --privileged=true -v /etc/ceph:/etc/ceph -v /var/lib/ceph/:/var/lib/ceph/ -e MON_IP=172.21.64.5 -e CEPH_PUBLIC_NETWORK=172.21.64.5 $IMG ceph/mon
:
INFO[0000] Failed to add conmon to cgroupfs sandbox cgroup: creating cgroup for pids: mkdir /sys/fs/cgroup/pids/conmon: permission denied
[conmon:d]: failed to write to /proc/self/oom_score_adj: Permission denied
:
DEBU[0000] Shutting down engines
# podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
@o05:~```
--
(I have not tried to escalate privileges like su - )
So my question is how do you use the nvme drives with a vstart cluster? (even if its not within a container)
(I was also looking at <https://wiki.sepia.ceph.com/doku.php?id=devplayground> for clues but was wondering whether I would mess things up).
Many thanks in advance! |
2024-09-03T12:33:53.501Z | <Jose J Palacios-Perez> some repos seem broken though: https://files.slack.com/files-pri/T1HG3J90S-F07KP75GWUB/download/screenshot_2024-09-03_at_13.33.04.png |
2024-09-03T12:37:08.781Z | <Jose J Palacios-Perez> ```[12:35:37]$ # dnf search cephadm --repo apt-mirror.front.sepia.ceph.com_lab-extras_8_
Last metadata expiration check: 5 days, 22:28:54 ago on Wed 28 Aug 2024 14:07:29 UTC.
No matches found.
@o05:~
[12:36:23]$ # dnf search ceph --repo apt-mirror.front.sepia.ceph.com_lab-extras_8_
======================================================================================================================== Name & Summary Matched: ceph ========================================================================================================================
centos-release-ceph-reef.noarch : Ceph Reef packages from the CentOS Storage SIG repository```: https://files.slack.com/files-pri/T1HG3J90S-F07KPAJDAM8/download/screenshot_2024-09-03_at_13.34.55.png |
2024-09-03T12:37:19.320Z | <Jose J Palacios-Perez> I might try from source instead 🤞 |
2024-09-03T12:43:53.813Z | <Guillaume Abrioux> @Adam Kraitman <https://jenkins.ceph.com/job/ceph-pull-requests/142608/consoleFull#9417136787ba86caa-bd6c-4071-8005-3f6d80f92e07> I still see this issue, is it still something under investigation ? |
2024-09-03T12:50:55.369Z | <Jose J Palacios-Perez> Nope, :cry: probably need an upgrade from centos8 to 9, since I managed to build within the container: https://files.slack.com/files-pri/T1HG3J90S-F07KP9P93KM/download/screenshot_2024-09-03_at_13.49.43.png |
2024-09-03T14:08:56.627Z | <Matan Breizman> Hey @Adam Kraitman, can we please upgrade this machine to c9?
Thanks! |
2024-09-03T18:10:38.689Z | <Adam Kraitman> Hey @Matan Breizman I can upgrade it but I prefer to do a fresh installation, it will be quicker, can you open a tracker ticket for this task ? |
2024-09-03T18:25:55.658Z | <Dan Mick> well, an issue was that a lot of protobuf files were apparently installed "by hand" (not part of any package). I don't have any idea what would have done that, or a way that I'm really comfortable with finding and deleting them (the best I'd come up with was a date range of m/ctime then fed into a "which package owns this file". I hope to have some time today to look at an automatic way to identify nonpackaged files that's reliable and doesn't get false-positives. |
2024-09-03T18:26:18.554Z | <Dan Mick> well, an issue was that a lot of protobuf files were apparently installed "by hand" (not part of any package). I don't have any idea what would have done that, or a way that I'm really comfortable with finding and deleting them (the best I'd come up with was a date range of m/ctime then fed into a "which package owns this file"). I hope to have some time today to look at an automatic way to identify nonpackaged files that's reliable and doesn't get false-positives. |
2024-09-03T19:55:38.642Z | <Patrick Donnelly> @Dan Mick @Adam Kraitman can one of you please kick vossi04 which appears to have died? |
2024-09-03T22:05:50.691Z | <Dan Mick> [7782197.176059] libceph: mon0 (2)127.0.0.1:40676 socket error on write
[7782197.432055] libceph: mon0 (2)127.0.0.1:40676 socket error on write
vossi04 [7782197.936058] libceph: mon0 (2)127.0.0.1:40676 socket error on write
login:
[7782199.095060] libceph: mon0 (2)127.0.0.1:40676 socket error on write
vossi04 login:
vossi04 login: |
2024-09-03T22:06:14.534Z | <Dan Mick> [7782197.176059] libceph: mon0 (2)127.0.0.1:40676 socket error on write
[7782197.432055] libceph: mon0 (2)127.0.0.1:40676 socket error on write
vossi04 [7782197.936058] libceph: mon0 (2)127.0.0.1:40676 socket error on write
login:
[7782199.095060] libceph: mon0 (2)127.0.0.1:40676 socket error on write
vossi04 login:
vossi04 login: |
2024-09-03T22:06:23.802Z | <Dan Mick> mon2, mds0, as well |
2024-09-03T22:07:07.174Z | <Dan Mick> doesn't appear to be responding to login efforts |
2024-09-03T22:10:27.586Z | <Dan Mick> I guess I'll powercycle |