2024-11-21T18:33:55.900Z | <Bailey Allison> hey all, running into an assert with this function here void PrimaryLogPG::on_failed_pull, each time the same PG on an OSD is able to recover it crashes the whole daemon, moved PG to different OSD same issue. pretty sure it's the same object too but i'd have to check with coworkers on that. does anyone know why it would crash the OSD ? (I will say the cluster is in a bit of a bad state in the backend too.....) |
2024-11-21T22:27:12.009Z | <Austin Axworthy> To give an update Baileys issue there. We have narrowed it down to a single object on a PG (so far) that is crashing the OSD. Have it narrowed down to a possible file in a single snapshot that is not currently listing.
We want to remove this object from the crashed osd using objectstore-tool. When listing the ops we get the error
ceph-objectstore-tool --pgid 9.cf1 --data-path /var/lib/ceph/osd/ceph-2/ --op list
Error getting attr on : 9.cf1s2_head,2#9:8f300000::::head#, (61) No data available
Wondering if anyone has ran into this, and has a way to delete this object from the PG through objectstore tool or other methods. |