ceph - cephfs - 2024-10-11

Timestamp (UTC)Message
2024-10-11T10:49:25.365Z
<Adam D> I can't run mds, it's stuck in replay, I have the following logs: Can I ask for help?
```mds debug 2024-10-11T10:48:26.111+0000 7f8c34af0640  1 heartbeat_map is_healthy 'MDSRank' had timed out after 15.000000954s                                                                                    
mds debug 2024-10-11T10:48:26.111+0000 7f8c34af0640  0 mds.beacon.ceph-filesystem-c Skipping beacon heartbeat to monitors (last acked 566.182s ago); MDS internal heartbeat is not healthy!```
2024-10-11T11:04:50.249Z
<Adam D> ```RANK      STATE             MDS         ACTIVITY   DNS    INOS   DIRS   CAPS
 0    replay(laggy)  ceph-filesystem-a               0      0      0      0
 1    replay(laggy)  ceph-filesystem-c               0      0      0      0```
2024-10-11T11:09:51.831Z
<Eugen Block> Is the overall cluster health okay? Looks like a potential network issue between MONs and MDS.
2024-10-11T11:12:23.341Z
<Adam D> everything works except mds on one of the filesystems
2024-10-11T11:13:32.232Z
<Adam D> I have 2 ranks there, both are not progressing in replay state.
2024-10-11T11:13:34.824Z
<Adam D> ```Info: running 'ceph' command with args: [fs dump]
e2788912
btime 2024-10-11T11:01:41:685987+0000
enable_multiple, ever_enabled_multiple: 1,1
default compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2}
legacy client fscid: 1

Filesystem 'ceph-filesystem' (1)
fs_name	ceph-filesystem
epoch	2788910
flags	32 joinable allow_snaps allow_multimds_snaps allow_standby_replay
created	2021-09-09T14:02:36.705806+0000
modified	2024-10-11T10:57:52.087390+0000
tableserver	0
root	0
session_timeout	1200
session_autoclose	1500
max_file_size	5497558138880
max_xattr_size	65536
required_client_features	{}
last_failure	0
last_failure_osd_epoch	258528
compat	compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2,11=minor log segments,12=quiesce subvolumes}
max_mds	2
in	0,1
up	{0=386479037,1=386469271}
failed
damaged
stopped	2,3,4
data_pools	[6]
metadata_pool	3
inline_data	disabled
balancer
bal_rank_mask	-1
standby_count_wanted	1
qdb_cluster	leader: 0 members:
[mds.ceph-filesystem-a{0:386479037} state up:replay seq 1 laggy since 2024-10-11T10:57:52.087390+0000 join_fscid=1 addr [v2:10.10.10.43:6893/2481823505,v1:10.10.10.43:6894/2481823505] compat {c=[1],r=[1],i=[1fff]}]
[mds.ceph-filesystem-c{1:386469271} state up:replay seq 1 laggy since 2024-10-11T10:57:47.084316+0000 join_fscid=1 addr [v2:10.10.10.48:6898/4235092719,v1:10.10.10.48:6899/4235092719] compat {c=[1],r=[1],i=[1fff]}]


Filesystem 'ceph-us-tools' (2)
fs_name	ceph-us-tools
epoch	2788860
flags	32 joinable allow_snaps allow_multimds_snaps allow_standby_replay
created	2021-09-23T12:57:42.102604+0000
modified	2024-10-11T10:26:34.677293+0000
tableserver	0
root	0
session_timeout	60
session_autoclose	300
max_file_size	1099511627776
max_xattr_size	65536
required_client_features	{}
last_failure	0
last_failure_osd_epoch	258508
compat	compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2,11=minor log segments,12=quiesce subvolumes}
max_mds	1
in	0
up	{0=386431346}
failed
damaged
stopped	1,2
data_pools	[13]
metadata_pool	12
inline_data	disabled
balancer
bal_rank_mask	-1
standby_count_wanted	1
qdb_cluster	leader: 386431346 members: 386431346
[mds.ceph-us-tools-b{0:386431346} state up:active seq 96 join_fscid=2 addr [v2:10.10.10.48:6896/2004519844,v1:10.10.10.48:6897/2004519844] compat {c=[1],r=[1],i=[1fff]}]
[mds.ceph-us-tools-a{0:386443989} state up:standby-replay seq 1 join_fscid=2 addr [v2:10.10.10.50:6896/751435631,v1:10.10.10.50:6897/751435631] compat {c=[1],r=[1],i=[1fff]}]


Filesystem 'ceph-filesystem-ssd' (3)
fs_name	ceph-filesystem-ssd
epoch	2788912
flags	32 joinable allow_snaps allow_multimds_snaps allow_standby_replay
created	2023-09-11T10:12:31.789623+0000
modified	2024-10-11T11:01:41.635827+0000
tableserver	0
root	0
session_timeout	1200
session_autoclose	1500
max_file_size	1099511627776
max_xattr_size	65536
required_client_features	{}
last_failure	0
last_failure_osd_epoch	258515
compat	compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2,11=minor log segments,12=quiesce subvolumes}
max_mds	2
in	0,1
up	{0=386404642,1=386404648}
failed
damaged
stopped
data_pools	[16]
metadata_pool	15
inline_data	disabled
balancer
bal_rank_mask	-1
standby_count_wanted	1
qdb_cluster	leader: 386404642 members: 386404642,386404648
[mds.ceph-filesystem-ssd-d{0:386404642} state up:active seq 152 export targets 1 join_fscid=3 addr [v2:10.10.10.48:6900/2422577721,v1:10.10.10.48:6901/2422577721] compat {c=[1],r=[1],i=[1fff]}]
[mds.ceph-filesystem-ssd-a{0:386454684} state up:standby-replay seq 1 join_fscid=3 addr [v2:10.10.10.45:6896/1084778608,v1:10.10.10.45:6897/1084778608] compat {c=[1],r=[1],i=[1fff]}]
[mds.ceph-filesystem-ssd-b{1:386404648} state up:active seq 114 export targets 0 join_fscid=3 addr [v2:10.10.10.50:6898/423150121,v1:10.10.10.50:6899/423150121] compat {c=[1],r=[1],i=[1fff]}]
[mds.ceph-filesystem-ssd-c{1:386406154} state up:standby-replay seq 1 join_fscid=3 addr [v2:10.10.10.47:6896/352722522,v1:10.10.10.47:6897/352722522] compat {c=[1],r=[1],i=[1fff]}]


dumped fsmap epoch 2788912```
2024-10-11T11:14:53.693Z
<Adam D> `ceph tell mds.ceph-filesystem-c status`  hanging
2024-10-11T11:20:10.314Z
<Adam D> I increased mds_beacon_grace significantly, and I'm waiting, but it restarting after 1200s
``` mds debug 2024-10-11T11:17:59.817+0000 7fa6f4ea7640  0 mds.beacon.ceph-filesystem-c Skipping beacon heartbeat to monitors (last acked 1285.6s ago); MDS internal heartbeat is not healthy!                     │
│ mds debug 2024-10-11T11:18:00.317+0000 7fa6f4ea7640  1 heartbeat_map is_healthy 'MDSRank' had timed out after 15.000000954s                                                                                    │
│ mds debug 2024-10-11T11:18:00.317+0000 7fa6f4ea7640  0 mds.beacon.ceph-filesystem-c Skipping beacon heartbeat to monitors (last acked 1286.1s ago); MDS internal heartbeat is not healthy!```
2024-10-11T11:21:05.585Z
<Adam D> ```bash-5.1$ ceph config get mon mds_beacon_grace
3600.000000
bash-5.1$ ceph config get mds mds_beacon_grace
360000.000000```
2024-10-11T11:37:39.431Z
<Adam D> one rank start doing something  1        replay     ceph-filesystem-a             310k   271k   109k     0
2024-10-11T11:39:32.511Z
<Eugen Block> You could turn on debug logs to see some more details, but be prepared for lots of data within only a few minutes, so make sure you have enough free disk space.
2024-10-11T11:53:53.604Z
<Adam D> it looks like one of the rank 1 started, it took about 1000s, unfortunately rank 0 is still hanging, and I am not able to extend this parameter after which it resets, as if it was ignoring it
2024-10-11T11:54:22.840Z
<Adam D> ``` ceph config get mds
WHO     MASK  LEVEL     OPTION                              VALUE               RO
mds           advanced  debug_mds                           1/5
mds           advanced  log_to_file                         false
mds           advanced  mds_bal_interval                    10
mds           advanced  mds_beacon_grace                    360000.000000
mds           advanced  mds_client_delegate_inos_pct        0
mds           advanced  mds_deny_all_reconnect              true
mds           advanced  mds_export_ephemeral_random_max     0.010000
global        advanced  mds_heartbeat_grace                 60.000000           *
global        advanced  mds_heartbeat_reset_grace           360000
global        basic     mds_max_caps_per_client             524288
global        advanced  mds_oft_prefetch_dirfrags           false
global        advanced  mds_session_blocklist_on_evict      false
global        advanced  mds_session_blocklist_on_timeout    false
global        advanced  mds_tick_interval                   2.000000
global        advanced  mon_allow_pool_delete               true
global        advanced  mon_allow_pool_size_one             true
global        advanced  mon_cluster_log_file
mds           advanced  mon_osd_blocklist_default_expire    60.000000           *
mds           advanced  mon_pg_warn_min_per_osd             0
global        advanced  osd_pool_default_pg_autoscale_mode  on
global        advanced  osd_pool_default_pg_num             32
global        advanced  osd_pool_default_pgp_num            0
global        advanced  osd_scrub_auto_repair               true
global        advanced  rbd_default_features                3
global        advanced  rbd_default_map_options             ms_mode=prefer-crc```
2024-10-11T12:31:19.003Z
<Adam D> how to block or extend the time of this replay reset? I increased all the beacon/heartbeat parameters, but still around 1200s the process starts from the beginning
2024-10-11T12:34:43.615Z
<Adam D> sorry to bother you, but maybe you have an idea how to force this? @Patrick Donnelly @Venky Shankar
2024-10-11T12:38:28.221Z
<Adam D> oh, it seems that the other mds (stanby-replay) were making some mess, now mds(rank 0) moved forward
2024-10-11T13:16:09.062Z
<Adam D> ```2 mds are active, but standby still has a problem with replay```
2024-10-11T13:16:13.461Z
<Adam D> 2 mds are active, but standby still has a problem with replay
2024-10-11T13:16:24.093Z
<Adam D> 2 mds are active, but standby still has a problem with replay
RANK  STATE          MDS            ACTIVITY     DNS    INOS   DIRS   CAPS
 0    active  ceph-filesystem-b  Reqs:    0 /s   370k   344k   113k   239k
 1    active  ceph-filesystem-a  Reqs:    0 /s   455k   398k   113k  25.1k
2024-10-11T19:18:04.185Z
<Adam D> after a few hours of struggle I managed to get this filesystem working. I have the impression that there are a lot of problems with switching between states. I will collect my observations, maybe you will have some good advice regarding the current configuration

Any issue? please create an issue here and use the infra label.