2024-07-05T00:19:49.288Z | <Md Mahamudur Rahaman Sajib> Hi Everyone, I am new to crimson(Also joined ceph community 3 months back). Can anyone share some information, when primary osd is getting down or osd is killed gracefully, what is the process of secondary osd takes control over that PG? What kind of message it uses? May be it will be pretty useful for me, if anyone can share some area of code I can look into. |
2024-07-05T13:26:32.411Z | <Md Mahamudur Rahaman Sajib> Hi everyone,
Whenever I killing a osd, I am getting this warnings in `ceph health detail`
```[WRN] PG_DEGRADED: Degraded data redundancy: 81/162 objects degraded (50.000%), 9 pgs degraded
pg 2.0 is active+undersized+degraded, acting [1]
pg 2.1 is active+undersized+degraded, acting [1]
pg 2.2 is active+undersized+degraded, acting [1]
pg 2.3 is active+undersized+degraded, acting [1]
pg 2.4 is active+undersized+degraded, acting [1]
pg 2.5 is active+undersized+degraded, acting [1]
pg 2.6 is active+undersized+degraded, acting [1]
pg 2.7 is active+undersized+degraded, acting [1]
pg 3.0 is active+undersized+degraded, acting [1]```
Does anyone know how `PeeringState` is invoked to `start_peering_interval` and start the recovery process?
Actually, I am working on this issue <https://tracker.ceph.com/issues/61761> . In code perspective of crimson can anyone provide a high level overview of how osd and monitors communicate with each other in terms of primary osd down? |