ceph - cephfs - 2025-01-11

Timestamp (UTC)Message
2025-01-11T01:29:07.450Z
<Bailey Allison> was finally able to compile wallclock profiler and run......here's output for rank 0 mds

```Thread 840351 (MR_Finisher) - 100 samples
+ 100.00% ???
 + 100.00% pthread_cond_wait

Thread 840346 (OpHistorySvc) - 100 samples
+ 100.00% clone
 + 100.00% start_thread
  + 100.00% OpHistoryServiceThread::entry()
   + 100.00% usleep
    + 100.00% nanosleep

Thread 840349 (PQ_Finisher) - 100 samples
+ 100.00% ???
 + 100.00% pthread_cond_wait

Thread 840334 (admin_socket) - 100 samples
+ 100.00% ???
 + 100.00% AdminSocket::entry()
  + 100.00% __poll

Thread 840324 (ceph-mds) - 100 samples
+ 100.00% ???
 + 100.00% pthread_cond_wait

Thread 840342 (ceph-mds) - 100 samples
+ 100.00% ???
 + 100.00% Beacon::send()
  + 100.00% pthread_cond_timedwait

Thread 840345 (ceph_timer) - 100 samples
+ 100.00% ???
 + 100.00% ceph::timer<ceph::coarse_mono_clock>::timer_thread()
  + 100.00% pthread_cond_timedwait

Thread 840337 (io_context_pool) - 100 samples
+ 100.00% ???
 + 100.00% std:🧵:_State_impl<std:🧵:_Invoker<std::tuple<make_named_thread<ceph::async::io_context_pool::start(short)::{lambda()#1}>(std:...
  + 100.00% _start
   + 100.00% boost::asio::detail::scheduler::do_run_one(boost::asio::detail::conditionally_enabled_mutex::scoped_lock&, boost::asio::detail::schedule...
    + 100.00% pthread_cond_wait

Thread 840336 (io_context_pool) - 100 samples
+ 100.00% ???
 + 100.00% std:🧵:_State_impl<std:🧵:_Invoker<std::tuple<make_named_thread<ceph::async::io_context_pool::start(short)::{lambda()#1}>(std:...
  + 100.00% _start
   + 100.00% boost::asio::detail::scheduler::do_run_one(boost::asio::detail::conditionally_enabled_mutex::scoped_lock&, boost::asio::detail::schedule...
    + 100.00% pthread_cond_wait

Thread 840325 (log) - 100 samples
+ 100.00% ???
 + 100.00% pthread_cond_wait

Thread 840353 (md_submit) - 100 samples
+ 100.00% ???
 + 100.00% pthread_cond_wait

Thread 840348 (mds_rank_progr) - 100 samples
+ 100.00% ???
 + 100.00% pthread_cond_wait

Thread 840338 (ms_dispatch) - 100 samples
+ 89.00% clone
|+ 89.00% start_thread
| + 89.00% DispatchQueue::DispatchThread::entry()
|  + 89.00% DispatchQueue::entry()
|   + 89.00% Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)
|    + 89.00% MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)
|     + 89.00% MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const> const&)
|      + 89.00% MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, bool)
|       + 89.00% MDCache::dispatch(boost::intrusive_ptr<Message const> const&)
|        + 89.00% MDCache::handle_cache_rejoin_weak(boost::intrusive_ptr<MMDSCacheRejoin const> const&)
|         + 89.00% MDCache::rejoin_scour_survivor_replicas(int, boost::intrusive_ptr<MMDSCacheRejoin const> const&, std::set<vinodeno_t, std::less<vi...
|          + 89.00% MDCache::create_subtree_map()
+ 11.00% ???

Thread 840347 (ms_dispatch) - 100 samples
+ 100.00% ???
 + 100.00% pthread_cond_wait

Thread 840339 (ms_local) - 100 samples
+ 100.00% ???
 + 100.00% pthread_cond_wait

Thread 840327 (msgr-worker-0) - 100 samples
+ 100.00% ???
 + 100.00% NetworkStack::drain()
  + 100.00% EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)
   + 100.00% EpollDriver::event_wait(std::vector<FiredFileEvent, std::allocator<FiredFileEvent> >&, timeval*)
    + 100.00% epoll_wait

Thread 840328 (msgr-worker-1) - 100 samples
+ 100.00% ???
 + 100.00% NetworkStack::drain()
  + 100.00% EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)
   + 100.00% EpollDriver::event_wait(std::vector<FiredFileEvent, std::allocator<FiredFileEvent> >&, timeval*)
    + 100.00% epoll_wait

Thread 840329 (msgr-worker-2) - 100 samples
+ 100.00% ???
 + 100.00% NetworkStack::drain()
  + 100.00% EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)
   + 100.00% EpollDriver::event_wait(std::vector<FiredFileEvent, std::allocator<FiredFileEvent> >&, timeval*)
    + 100.00% epoll_wait

Thread 840340 (safe_timer) - 100 samples
+ 100.00% clone
 + 100.00% start_thread
  + 100.00% CommonSafeTimerThread<std::mutex>::entry()
   + 100.00% CommonSafeTimer<std::mutex>::timer_thread()
    + 100.00% pthread_cond_timedwait

Thread 840344 (safe_timer) - 100 samples
+ 100.00% clone
 + 100.00% start_thread
  + 100.00% CommonSafeTimerThread<std::mutex>::entry()
   + 100.00% CommonSafeTimer<std::mutex>::timer_thread()
    + 100.00% pthread_cond_timedwait

Thread 840341 (safe_timer) - 100 samples
+ 100.00% ???
 + 100.00% pthread_cond_wait

Thread 840350 (safe_timer) - 100 samples
+ 100.00% ???
 + 100.00% pthread_cond_wait

Thread 840333 (service) - 100 samples
+ 100.00% clone
 + 100.00% start_thread
  + 100.00% ceph::common::CephContextServiceThread::entry()
   + 100.00% pthread_cond_timedwait

Thread 840335 (signal_handler) - 100 samples
+ 100.00% clone
 + 100.00% start_thread
  + 100.00% SignalHandler::entry()
   + 100.00% __poll```
2025-01-11T01:37:55.566Z
<Dan van der Ster> ```Thread 840338 (ms_dispatch) - 100 samples
+ 89.00% clone
|+ 89.00% start_thread
| + 89.00% DispatchQueue::DispatchThread::entry()
|  + 89.00% DispatchQueue::entry()
|   + 89.00% Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)
|    + 89.00% MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)
|     + 89.00% MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const> const&)
|      + 89.00% MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, bool)
|       + 89.00% MDCache::dispatch(boost::intrusive_ptr<Message const> const&)
|        + 89.00% MDCache::handle_cache_rejoin_weak(boost::intrusive_ptr<MMDSCacheRejoin const> const&)
|         + 89.00% MDCache::rejoin_scour_survivor_replicas(int, boost::intrusive_ptr<MMDSCacheRejoin const> const&, std::set<vinodeno_t, std::less<vi...
|          + 89.00% MDCache::create_subtree_map()
+ 11.00% ???```
good job -- so that show's it's doing work. It's busy in MDCache::rejoin_scour_survivor_replicas
 <https://github.com/ceph/ceph/blob/v17.2.7/src/mds/MDCache.cc#L4609>
2025-01-11T01:48:15.366Z
<Bailey Allison> both daemons are in same state per wallclock profiler as well
2025-01-11T01:49:10.934Z
<Dan van der Ster> I updated <https://tracker.ceph.com/issues/64717#note-8>
2025-01-11T01:52:22.923Z
<Bailey Allison> thank you very much !
2025-01-11T01:53:49.268Z
<Bailey Allison> with regards to the %'s listed what does that mean in this context, seeing it appear to loop/finish ex. check it once it's 97%, check again in a few mins and it's 89%
2025-01-11T01:54:31.564Z
<Dan van der Ster> % time spent in that function.
2025-01-11T01:55:02.239Z
<Dan van der Ster> is it always
```    + 89.00% MDCache::rejoin_scour_survivor_replicas(int, boost::intrusive_ptr<MMDSCacheRejoin const> const&, std::set<vinodeno_t, std::less<vi...
|          + 89.00% MDCache::create_subtree_map()```
(not the percentages -- the function names)
2025-01-11T01:55:07.224Z
<Dan van der Ster> is it always ?
```    + 89.00% MDCache::rejoin_scour_survivor_replicas(int, boost::intrusive_ptr<MMDSCacheRejoin const> const&, std::set<vinodeno_t, std::less<vi...
|          + 89.00% MDCache::create_subtree_map()```
(not the percentages -- the function names)
2025-01-11T01:55:30.889Z
<Bailey Allison> yes it is always that function
2025-01-11T01:57:48.796Z
<Dan van der Ster> yeah okay.
rejoin_scour_survivor_replicas iterates over every inode, and every subtree_inode.
For each of those it does the work in `rejoin_scour_survivor_replicas` ... I'm not familiar with that code to figure out why it's really taking so long.

I think still it's best to wait -- let it work, feed it as much memory as it needs
2025-01-11T02:01:41.775Z
<Bailey Allison> gotcha, that we can do
2025-01-11T02:02:59.332Z
<Bailey Allison> thanks again for the help, I owe ya big for this ! If you're ever in NS beers are on me and the entire team hahahaha
2025-01-11T02:04:37.160Z
<Dan van der Ster> would love to

Any issue? please create an issue here and use the infra label.