2025-01-06T20:30:24.788Z | <gregsfortytwo> The big MDS lock is never held while network activity is happening, so that invocation is just queuing things up and working in memory, then will return |
2025-01-06T20:31:22.109Z | <gregsfortytwo> This is part of our usual event loop pattern where we dispatch messages or Contexts and they return to the caller once they have to wait on input from elsewhere |
2025-01-06T20:38:02.587Z | <Md Mahamudur Rahaman Sajib> I got your point then why in the scrubstack it is still having `ceph_assert(ceph_mutex_is_locked(mdcache->mds->mds_lock))`
in the `void ScrubStack::kick_off_scrubs()` function? Doesn't that mean 2 scrub can not happen concurrently and also `void ScrubStack::scrub_abort(Context *on_finish)` also have the same
`ceph_assert(ceph_mutex_is_locked_by_me(mdcache->mds->mds_lock));` If kick_of_scrubs is holding that lock then how abort will happen ? abort can only acquire lock after scrub finishes, isn't it?
Okay let me be specific about the question, let's say I started a scrub from CLI, it acquired the lock and queued the scrub job, when that lock released, after queuing or after finishing the scrub? |
2025-01-06T20:38:55.385Z | <gregsfortytwo> That lock is released after queuing. |
2025-01-06T20:41:11.844Z | <gregsfortytwo> I suggest you look through a couple simpler operations like getattr and setattr and make sure you grok the event loop and locking rules if you’re digging in to this |
2025-01-06T20:46:49.045Z | <Md Mahamudur Rahaman Sajib> Sure, I will look into it, but my next question where this confusion occurs,
```void ScrubStack::kick_off_scrubs()
{
ceph_assert(ceph_mutex_is_locked(mdcache->mds->mds_lock));```
Why this ceph_assert when `kich_off_scrubs` this step is after queuing is done(And this function is a callback function after each inode scrub in the directory tree), shouldn't be this ceph_assert getting false(which is not the case).
Also I did some testing, I put some delay from example 100s sleep, and from 2 panel I started scrub job. Second scrub job exactly starts after first scrub job finishes(which is 100s) |
2025-01-06T20:48:13.488Z | <Md Mahamudur Rahaman Sajib> Sure, I will look into it, but my next question where this confusion occurs,
```void ScrubStack::kick_off_scrubs()
{
ceph_assert(ceph_mutex_is_locked(mdcache->mds->mds_lock));```
Why this ceph_assert when `kich_off_scrubs` this step is after queuing is done(And this function is a callback function after each inode scrub in the directory tree), shouldn't be this ceph_assert getting false(which is not the case).
Also I did some testing, I put some delay for example 100s sleep in the scrub code, and from 2 panel I started scrub job. Second scrub job exactly starts after first scrub job finishes(which is 100s) |
2025-01-06T20:49:15.017Z | <Md Mahamudur Rahaman Sajib> Sure, I will look into it, but my next question where this confusion occurs,
```void ScrubStack::kick_off_scrubs()
{
ceph_assert(ceph_mutex_is_locked(mdcache->mds->mds_lock));```
Why this ceph_assert when `kich_off_scrubs` this step is after queuing is done(And this function is a callback function after each inode scrub in the directory tree), shouldn't be this ceph_assert getting false(which is not the case).
Also I did some testing, I put some delay for example 100s sleep in the scrub code, and from 2 panel I started scrub job. Second scrub job exactly starts after first scrub job finishes(which is 100s), Same thing happned for scrub abort. |
2025-01-06T20:56:54.011Z | <Md Mahamudur Rahaman Sajib> Sure, I will look into it, but my next question where this confusion occurs,
```void ScrubStack::kick_off_scrubs()
{
ceph_assert(ceph_mutex_is_locked(mdcache->mds->mds_lock));```
Why this ceph_assert when `kich_off_scrubs` this step is after queuing is done(And this function is a callback function after each inode scrub in the directory tree), shouldn't be this ceph_assert getting false(which is not the case).
Also I did some testing, I put some delay for example 100s sleep in the scrub code, and from 2 panel I started 2 scrub job (`./bin/ceph daemon mds.a scrub start /root1 recursive repair` .this way) Second scrub job exactly starts after first scrub job finishes(which is 100s), Same thing happned for scrub abort. |
2025-01-06T22:19:59.296Z | <Md Mahamudur Rahaman Sajib> I understood now, I was completely wrong about that event loop. Thanks for the explanation. |
2025-01-06T22:28:31.454Z | <Md Mahamudur Rahaman Sajib> I understood now, I was completely wrong. Thanks for the explanation. |