ceph - cephadm - 2024-08-25

Timestamp (UTC)Message
2024-08-25T09:17:50.093Z
<Laimis Juzeliūnas> Hey all, we recently went through an automated upgrade from 18.2.2 to 18.2.4 - triggered from the UI with a click of a button, no manual intervention apart from stopping scrubbing and deep scrubbing temporarily. Noticed that the upgrade got stuck when it reached MDS upgrades - all prior ones completed successfully. We noticed that Ceph/cephadm could not properly stop MDS daemons and scale them down to a single instance rank 0. MDS would simply not evict clients and `ceph fs status` would display the highest rank MDS daemon in a 'stopping' state with nothing happening. Stopping MDS daemons through the UI would still not allow the upgrade to continue event with a single rank 0 instance running.
We had to stop the upgrade, launch MDS upgrade manually/through console and order MDS to evict clients in order to properly scale down and let the upgrade do its job.

I was wondering if anyone had encountered anything similar and if this is some weir behaviour of cephadm?
2024-08-25T09:20:00.502Z
<Laimis Juzeliūnas> Hey all, we recently went through an automated upgrade from 18.2.2 to 18.2.4 - triggered from the UI with a click of a button, no manual intervention apart from stopping scrubbing and deep scrubbing temporarily. Noticed that the upgrade got stuck when it reached MDS upgrades - all prior ones completed successfully. We noticed that Ceph/cephadm could not properly stop MDS daemons and scale them down to a single instance rank 0. MDS would simply not evict clients and `ceph fs status` would display the highest rank MDS daemon in a 'stopping' state with nothing happening. Stopping MDS daemons through the UI would still not allow the upgrade to continue event with a single rank 0 instance running.
We had to stop the upgrade, launch MDS upgrade manually/through console and order MDS to evict clients (`ceph tell mds.* client evict`) in order to properly scale down and let the upgrade do its job.

I was wondering if anyone had encountered anything similar and if this is some weir behaviour of cephadm?
2024-08-25T18:50:17.624Z
<Eugen Block> Unfortunately, that is still an issue with multi-active MDS. I’m not sure if there’s work in progress on that, I sure hope so. I asked that question at the Cephalocon 2023, they are aware of that but I haven’t heard anything new since. I didn’t check the notes for the Squid release candidate yet.

Any issue? please create an issue here and use the infra label.