ceph - ceph-devel - 2024-11-21

Timestamp (UTC)Message
2024-11-21T18:33:55.900Z
<Bailey Allison> hey all, running into an assert with this function here void PrimaryLogPG::on_failed_pull, each time the same PG on an OSD is able to recover it crashes the whole daemon, moved PG to different OSD same issue. pretty sure it's the same object too but i'd have to check with coworkers on that. does anyone know why it would crash the OSD ? (I will say the cluster is in a bit of a bad state in the backend too.....)
2024-11-21T18:34:37.745Z
<Bailey Allison> if need more info let me know
2024-11-21T22:27:12.009Z
<Austin Axworthy> To give an update Baileys issue there. We have narrowed it down to a single object on a PG (so far) that is crashing the OSD. Have it narrowed down to a possible file in a single snapshot that is not currently listing.
We want to remove this object from the crashed osd using objectstore-tool. When listing the ops we get the error

ceph-objectstore-tool --pgid 9.cf1 --data-path /var/lib/ceph/osd/ceph-2/ --op list
Error getting attr on : 9.cf1s2_head,2#9:8f300000::::head#, (61) No data available

Wondering if anyone has ran into this, and has a way to delete this object from the PG through objectstore tool or other methods.

Any issue? please create an issue here and use the infra label.