ceph - cephadm - 2024-08-28

Timestamp (UTC)Message
2024-08-28T11:32:10.025Z
<verdurin> Owing to an unfortunate typo with the hostname, an upgraded OSD host was accepted into the cluster but as a `STRAY_HOST`. I was able to move it to the correct place in CRUSH, and have corrected the hostname on the node itself.

What's the cleanest/least disruptive way of ensuring it is "adopted" by Cephadm?
`ceph orch host add <hostname>` to effectively replace the existing entry?
2024-08-28T11:54:17.990Z
<Eugen Block> Is it currently in the orch host list?
2024-08-28T11:57:02.613Z
<verdurin> The original name is, yes.
2024-08-28T11:57:59.321Z
<verdurin> However, in CRUSH the original name is listed but empty, and the OSDs are shown against the temporarily wrong name.
2024-08-28T12:04:50.376Z
<Eugen Block> I'm not really sure if that could work, but I'm thinking about removing the original (correct) host from the host list (ceph orch host rm <host>). Since the OSDs are currently mapped to a different hostname, it only needs to remove the entry, nothing else (no OSDs are drained). Then you add the host again with the correct name. To fix the crush tree you could rename the crush bucket (ceph osd crush rename-bucket). But this is just brainstorming, I haven't had such a case yet...
2024-08-28T12:05:53.234Z
<verdurin> Yes, I don't recommend it.
2024-08-28T12:06:18.185Z
<verdurin> Am going to wait until the backfilling has completed before doing anything, anyway.
2024-08-28T12:11:02.702Z
<verdurin> Thanks for the thoughts.
2024-08-28T12:13:08.537Z
<Eugen Block> don't mention it. I also looked into the config-key store (ceph config-key get mgr/cephadm/host.<host>) to see if that could be changed. it could, but I don't like fiddling with that stuff in a production cluster if I can't tell what would happen.
2024-08-28T15:43:01.472Z
<Benard> I am currently testing migration of ceph to cephadm. I followed <https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/> to set up a mon that I ran manually with `ceph-mon -i {mon-id} --public-netwprl {network}` which worked fine and joined the quorum. However when i try to adopt it with `cephadm adopt --style legacy --name mon.<hostname>` there are some logs that shows its doing stuff but no mon comes online. `/var/lib/ceph/mon/ceph-<hostname>` is wiped but no containers are created

Slack Conversation
2024-08-28T15:43:12.516Z
<Benard> I tried to run with the -v flag but there dont seem to be any interesting logs that show an issue, the mon simply just doesnt start. Am I missing something?
2024-08-28T18:05:24.263Z
<Adam King> Was anything interesting printed in the /var/log/ceph/cephadm.log around the time of adoption?
2024-08-28T18:05:57.689Z
<Adam King> also curious if a /var/lib/ceph/<fsid>/mon.<hostname> dir was made

Any issue? please create an issue here and use the infra label.