ceph - cephfs - 2024-10-11

Timestamp (UTC)	Message
2024-10-11T10:49:25.365Z	<Adam D> I can't run mds, it's stuck in replay, I have the following logs: Can I ask for help? ```mds debug 2024-10-11T10:48:26.111+0000 7f8c34af0640 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15.000000954s mds debug 2024-10-11T10:48:26.111+0000 7f8c34af0640 0 mds.beacon.ceph-filesystem-c Skipping beacon heartbeat to monitors (last acked 566.182s ago); MDS internal heartbeat is not healthy!```
2024-10-11T11:04:50.249Z	<Adam D> ```RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 replay(laggy) ceph-filesystem-a 0 0 0 0 1 replay(laggy) ceph-filesystem-c 0 0 0 0```
2024-10-11T11:09:51.831Z	<Eugen Block> Is the overall cluster health okay? Looks like a potential network issue between MONs and MDS.
2024-10-11T11:12:23.341Z	<Adam D> everything works except mds on one of the filesystems
2024-10-11T11:13:32.232Z	<Adam D> I have 2 ranks there, both are not progressing in replay state.
2024-10-11T11:13:34.824Z	<Adam D> ```Info: running 'ceph' command with args: [fs dump] e2788912 btime 2024-10-11T11:01:41:685987+0000 enable_multiple, ever_enabled_multiple: 1,1 default compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2} legacy client fscid: 1 Filesystem 'ceph-filesystem' (1) fs_name ceph-filesystem epoch 2788910 flags 32 joinable allow_snaps allow_multimds_snaps allow_standby_replay created 2021-09-09T14:02:36.705806+0000 modified 2024-10-11T10:57:52.087390+0000 tableserver 0 root 0 session_timeout 1200 session_autoclose 1500 max_file_size 5497558138880 max_xattr_size 65536 required_client_features {} last_failure 0 last_failure_osd_epoch 258528 compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2,11=minor log segments,12=quiesce subvolumes} max_mds 2 in 0,1 up {0=386479037,1=386469271} failed damaged stopped 2,3,4 data_pools [6] metadata_pool 3 inline_data disabled balancer bal_rank_mask -1 standby_count_wanted 1 qdb_cluster leader: 0 members: [mds.ceph-filesystem-a{0:386479037} state up:replay seq 1 laggy since 2024-10-11T10:57:52.087390+0000 join_fscid=1 addr [v2:10.10.10.43:6893/2481823505,v1:10.10.10.43:6894/2481823505] compat {c=[1],r=[1],i=[1fff]}] [mds.ceph-filesystem-c{1:386469271} state up:replay seq 1 laggy since 2024-10-11T10:57:47.084316+0000 join_fscid=1 addr [v2:10.10.10.48:6898/4235092719,v1:10.10.10.48:6899/4235092719] compat {c=[1],r=[1],i=[1fff]}] Filesystem 'ceph-us-tools' (2) fs_name ceph-us-tools epoch 2788860 flags 32 joinable allow_snaps allow_multimds_snaps allow_standby_replay created 2021-09-23T12:57:42.102604+0000 modified 2024-10-11T10:26:34.677293+0000 tableserver 0 root 0 session_timeout 60 session_autoclose 300 max_file_size 1099511627776 max_xattr_size 65536 required_client_features {} last_failure 0 last_failure_osd_epoch 258508 compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2,11=minor log segments,12=quiesce subvolumes} max_mds 1 in 0 up {0=386431346} failed damaged stopped 1,2 data_pools [13] metadata_pool 12 inline_data disabled balancer bal_rank_mask -1 standby_count_wanted 1 qdb_cluster leader: 386431346 members: 386431346 [mds.ceph-us-tools-b{0:386431346} state up:active seq 96 join_fscid=2 addr [v2:10.10.10.48:6896/2004519844,v1:10.10.10.48:6897/2004519844] compat {c=[1],r=[1],i=[1fff]}] [mds.ceph-us-tools-a{0:386443989} state up:standby-replay seq 1 join_fscid=2 addr [v2:10.10.10.50:6896/751435631,v1:10.10.10.50:6897/751435631] compat {c=[1],r=[1],i=[1fff]}] Filesystem 'ceph-filesystem-ssd' (3) fs_name ceph-filesystem-ssd epoch 2788912 flags 32 joinable allow_snaps allow_multimds_snaps allow_standby_replay created 2023-09-11T10:12:31.789623+0000 modified 2024-10-11T11:01:41.635827+0000 tableserver 0 root 0 session_timeout 1200 session_autoclose 1500 max_file_size 1099511627776 max_xattr_size 65536 required_client_features {} last_failure 0 last_failure_osd_epoch 258515 compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2,11=minor log segments,12=quiesce subvolumes} max_mds 2 in 0,1 up {0=386404642,1=386404648} failed damaged stopped data_pools [16] metadata_pool 15 inline_data disabled balancer bal_rank_mask -1 standby_count_wanted 1 qdb_cluster leader: 386404642 members: 386404642,386404648 [mds.ceph-filesystem-ssd-d{0:386404642} state up:active seq 152 export targets 1 join_fscid=3 addr [v2:10.10.10.48:6900/2422577721,v1:10.10.10.48:6901/2422577721] compat {c=[1],r=[1],i=[1fff]}] [mds.ceph-filesystem-ssd-a{0:386454684} state up:standby-replay seq 1 join_fscid=3 addr [v2:10.10.10.45:6896/1084778608,v1:10.10.10.45:6897/1084778608] compat {c=[1],r=[1],i=[1fff]}] [mds.ceph-filesystem-ssd-b{1:386404648} state up:active seq 114 export targets 0 join_fscid=3 addr [v2:10.10.10.50:6898/423150121,v1:10.10.10.50:6899/423150121] compat {c=[1],r=[1],i=[1fff]}] [mds.ceph-filesystem-ssd-c{1:386406154} state up:standby-replay seq 1 join_fscid=3 addr [v2:10.10.10.47:6896/352722522,v1:10.10.10.47:6897/352722522] compat {c=[1],r=[1],i=[1fff]}] dumped fsmap epoch 2788912```
2024-10-11T11:14:53.693Z	<Adam D> `ceph tell mds.ceph-filesystem-c status` hanging
2024-10-11T11:20:10.314Z	<Adam D> I increased mds_beacon_grace significantly, and I'm waiting, but it restarting after 1200s ``` mds debug 2024-10-11T11:17:59.817+0000 7fa6f4ea7640 0 mds.beacon.ceph-filesystem-c Skipping beacon heartbeat to monitors (last acked 1285.6s ago); MDS internal heartbeat is not healthy! │ │ mds debug 2024-10-11T11:18:00.317+0000 7fa6f4ea7640 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15.000000954s │ │ mds debug 2024-10-11T11:18:00.317+0000 7fa6f4ea7640 0 mds.beacon.ceph-filesystem-c Skipping beacon heartbeat to monitors (last acked 1286.1s ago); MDS internal heartbeat is not healthy!```
2024-10-11T11:21:05.585Z	<Adam D> ```bash-5.1$ ceph config get mon mds_beacon_grace 3600.000000 bash-5.1$ ceph config get mds mds_beacon_grace 360000.000000```
2024-10-11T11:37:39.431Z	<Adam D> one rank start doing something 1 replay ceph-filesystem-a 310k 271k 109k 0
2024-10-11T11:39:32.511Z	<Eugen Block> You could turn on debug logs to see some more details, but be prepared for lots of data within only a few minutes, so make sure you have enough free disk space.
2024-10-11T11:53:53.604Z	<Adam D> it looks like one of the rank 1 started, it took about 1000s, unfortunately rank 0 is still hanging, and I am not able to extend this parameter after which it resets, as if it was ignoring it
2024-10-11T11:54:22.840Z	<Adam D> ``` ceph config get mds WHO MASK LEVEL OPTION VALUE RO mds advanced debug_mds 1/5 mds advanced log_to_file false mds advanced mds_bal_interval 10 mds advanced mds_beacon_grace 360000.000000 mds advanced mds_client_delegate_inos_pct 0 mds advanced mds_deny_all_reconnect true mds advanced mds_export_ephemeral_random_max 0.010000 global advanced mds_heartbeat_grace 60.000000 * global advanced mds_heartbeat_reset_grace 360000 global basic mds_max_caps_per_client 524288 global advanced mds_oft_prefetch_dirfrags false global advanced mds_session_blocklist_on_evict false global advanced mds_session_blocklist_on_timeout false global advanced mds_tick_interval 2.000000 global advanced mon_allow_pool_delete true global advanced mon_allow_pool_size_one true global advanced mon_cluster_log_file mds advanced mon_osd_blocklist_default_expire 60.000000 * mds advanced mon_pg_warn_min_per_osd 0 global advanced osd_pool_default_pg_autoscale_mode on global advanced osd_pool_default_pg_num 32 global advanced osd_pool_default_pgp_num 0 global advanced osd_scrub_auto_repair true global advanced rbd_default_features 3 global advanced rbd_default_map_options ms_mode=prefer-crc```
2024-10-11T12:31:19.003Z	<Adam D> how to block or extend the time of this replay reset? I increased all the beacon/heartbeat parameters, but still around 1200s the process starts from the beginning
2024-10-11T12:34:43.615Z	<Adam D> sorry to bother you, but maybe you have an idea how to force this? @Patrick Donnelly @Venky Shankar
2024-10-11T12:38:28.221Z	<Adam D> oh, it seems that the other mds (stanby-replay) were making some mess, now mds(rank 0) moved forward
2024-10-11T13:16:09.062Z	<Adam D> ```2 mds are active, but standby still has a problem with replay```
2024-10-11T13:16:13.461Z	<Adam D> 2 mds are active, but standby still has a problem with replay
2024-10-11T13:16:24.093Z	<Adam D> 2 mds are active, but standby still has a problem with replay RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active ceph-filesystem-b Reqs: 0 /s 370k 344k 113k 239k 1 active ceph-filesystem-a Reqs: 0 /s 455k 398k 113k 25.1k
2024-10-11T19:18:04.185Z	<Adam D> after a few hours of struggle I managed to get this filesystem working. I have the impression that there are a lot of problems with switching between states. I will collect my observations, maybe you will have some good advice regarding the current configuration

ceph - cephfs - 2024-10-11

Any issue? please create an issue here and use the infra label.