ceph - crimson - 2024-07-05

Timestamp (UTC)Message
2024-07-05T00:19:49.288Z
<Md Mahamudur Rahaman Sajib> Hi Everyone, I am new to crimson(Also joined ceph community 3 months back). Can anyone share some information, when primary osd is getting down or osd is killed gracefully, what is the process of secondary osd takes control over that PG? What kind of message it uses? May be it will be pretty useful for me, if anyone can share some area of code I can look into.
2024-07-05T02:19:48.825Z
<Rongqi Sun> @Matan Breizman Replied in tracker, is it reasonable?
2024-07-05T13:26:32.411Z
<Md Mahamudur Rahaman Sajib> Hi everyone,
Whenever I killing a osd, I am getting this warnings in `ceph health detail`
```[WRN] PG_DEGRADED: Degraded data redundancy: 81/162 objects degraded (50.000%), 9 pgs degraded
    pg 2.0 is active+undersized+degraded, acting [1]
    pg 2.1 is active+undersized+degraded, acting [1]
    pg 2.2 is active+undersized+degraded, acting [1]
    pg 2.3 is active+undersized+degraded, acting [1]
    pg 2.4 is active+undersized+degraded, acting [1]
    pg 2.5 is active+undersized+degraded, acting [1]
    pg 2.6 is active+undersized+degraded, acting [1]
    pg 2.7 is active+undersized+degraded, acting [1]
    pg 3.0 is active+undersized+degraded, acting [1]```
Does anyone know how `PeeringState` is invoked to `start_peering_interval`  and start the recovery process?
Actually, I am working on this issue <https://tracker.ceph.com/issues/61761> . In code perspective of crimson can anyone provide a high level overview of how osd and monitors communicate with each other in terms of primary osd down?
2024-07-05T22:01:01.544Z
<Samuel Just> That's a pretty big topic -- I'd start by reading and understanding <https://ceph.com/assets/pdfs/weil-rados-pdsw07.pdf>

Any issue? please create an issue here and use the infra label.