ceph - cephfs - 2025-01-09

Timestamp (UTC)	Message
2025-01-09T04:46:36.891Z	<Venky Shankar> Are client still doing I/O? That's going to fill up the journal as it gets trimmed (slowing down it a bit more) @Austin Axworthy?
2025-01-09T05:24:24.700Z	<Austin Axworthy> Currently all active MDS are in replay so no client access. but it is progressing. Both have ballooned to 180GB of memory. Think it is just a waiting game at this point, ETA on replay is 30 minutes or so. Have followed everything in <https://docs.ceph.com/en/quincy/cephfs/troubleshooting/> to help speed it up and prevent client reconnections
2025-01-09T05:27:23.808Z	<Venky Shankar> yeh. how did you manage to end up with huge journal? I see its multimds, so aren't you using subtree pinning?
2025-01-09T05:30:05.828Z	<Bailey Allison> it's got static pinning set on directories , not too sure entirely how it got that big we kind of walked into a mess on this one
2025-01-09T05:30:55.794Z	<Venky Shankar> you still need to disable the default mds balancer, unless all directories are pinned.
2025-01-09T05:30:57.472Z	<Bailey Allison> also we've been working on it for like 12 hours so sorry if we're saying nonsense currently hahahaha
2025-01-09T05:31:28.006Z	<Venky Shankar> > also we've been working on it for like 12 hours so sorry if we're saying nonsense currently hahahaha never mind -- we have all been there 🙂
2025-01-09T05:33:00.640Z	<Austin Axworthy> The default is disabled, but also all directories have been pinned with their nested dirs inheriting. Appreciate the help, we can see the light at the end of the tunnel!
2025-01-09T06:01:29.408Z	<Austin Axworthy> Both are now in the resolve state for some time now, is there a way to get some progress on this or just wait?
2025-01-09T06:02:37.487Z	<Venky Shankar> just wait for now - since its multimds, the peer mdss are synchronizing cache and lock state and there isn't a nice way to estimate how much time it would take.
2025-01-09T06:03:03.641Z	<Austin Axworthy> Thank you figured as much...wanted to be sure
2025-01-09T06:28:24.768Z	<Adam D> Some time ago I had a similar problem, too high value of `mds_log_max_segments` can cause long mds reply . Additionally, I noticed that there was a competition in recovery process, mds standby-reply was able to reset the whole process after a specific timeout, which unfortunately I could not change - enabling mds one by one helped
2025-01-09T06:29:39.555Z	<Venky Shankar> @Adam D -- right, if `mds_log_max_segments` is set to too high, the mds will not start timming log segments till the number of segments exceed that.
2025-01-09T06:30:00.751Z	<Venky Shankar> So, its best to leave it at whatever the default it unless there is a pressing need to change it.
2025-01-09T06:31:32.204Z	<Venky Shankar> So, its best to leave it at whatever the default is unless there is a pressing need to change it.
2025-01-09T07:24:56.474Z	<Austin Axworthy> ya we're at the default for that
2025-01-09T07:27:03.033Z	<Austin Axworthy> currently both mds stuck in resolve(laggy) state, checking the log for both they are looping on ```heartbeat_map is health 'MDSRank' had timed out after 15.000000954s mds.beacon.mdsname Skipping beacon heartbeat to minitors (last acked xxxxs ago); MDS internal heartbeat is not healthy!```
2025-01-09T07:27:51.479Z	<Austin Axworthy> trying to increase mds_heartbeat_grace but neither daemon responding to ceph tell for injectargs or ceph daemon to socket config settings
2025-01-09T07:38:51.388Z	<Venky Shankar> @Austin Axworthy Use `ceph config set ...`
2025-01-09T07:39:50.830Z	<Austin Axworthy> have set it with that, does that apply to daemon live ?
2025-01-09T07:40:13.782Z	<Austin Axworthy> we have already set it with that, does that apply to daemon live ?
2025-01-09T07:42:11.413Z	<Venky Shankar> It should -- the config gets updated in the MDS.
2025-01-09T07:44:22.779Z	<Austin Axworthy> at this point where both daemons are stuck in resolve:laggy and those current logs, after having set that do you think we still just need to wait ? the last acked message in the logs for one mds is at 6600 seconds, and about 2800 seconds for the other mds, in addition are there any other config settings we should look at enabling ?
2025-01-09T07:44:46.549Z	<Austin Axworthy> at this point where both daemons are stuck in resolve:laggy and those current logs, after having set mds_heartbeat_grace to 3600 do you think we still just need to wait ? the last acked message in the logs for one mds is at 6600 seconds, and about 2800 seconds for the other mds, in addition are there any other config settings we should look at enabling ?
2025-01-09T07:48:54.258Z	<Austin Axworthy> in addition we're now seeing this in the mds logs too, however time is synced correctly across all of the servers in the cluster ```ceph monclient: check_auth_rotation possible clock skew rotating keys expired way too early```
2025-01-09T07:50:35.713Z	<Venky Shankar> the monclient warning is out of my wheelhouse really.
2025-01-09T07:51:02.469Z	<Austin Axworthy> I have seen that before in other issues with ceph so I feel it's not related in this case either
2025-01-09T07:51:10.569Z	<Austin Axworthy> I've seen it in osd logs and stuff before
2025-01-09T07:52:24.569Z	<Venky Shankar> At this point try the suggestions in: <https://docs.ceph.com/en/quincy/cephfs/troubleshooting/#avoiding-recovery-roadblocks> (in case some of the recommended configs are not set as per the doc).
2025-01-09T07:56:52.066Z	<Austin Axworthy> for better or for worse we have already set all of these configs earlier, the last one we didn't have was the previous one we were trying to set earlier the mds_heartbeat_grace
2025-01-09T07:57:29.222Z	<Austin Axworthy> given the 15s timeout and looping for couple hours, is it likely there is any progress being made in the resolve(laggy) or is it just stuck at where it is at
2025-01-09T07:57:42.850Z	<Austin Axworthy> given the 15s timeout and looping for couple hours we are seeing in the logs, is it likely there is any progress being made in the resolve(laggy) or is it just stuck at where it is at
2025-01-09T08:06:03.263Z	<Venky Shankar> another option is to reduce max_mds to 1 and get the MDS up:active so workaround. the resolve state. Since the journal replay was completed, the read and write pos should have been updated in the header.
2025-01-09T08:06:05.695Z	<Venky Shankar> Its pretty much stuck I think.
2025-01-09T08:06:14.350Z	<Venky Shankar> another option is to reduce max_mds to 1 and get the MDS up:active so workaround. the resolve state.
2025-01-09T08:11:12.242Z	<Venky Shankar> Huge journal sizes have always resulted in problematic mds failovers -- unfortunately, you have run into that.
2025-01-09T08:11:34.033Z	<Austin Axworthy> ya it does appear to be stuck
2025-01-09T08:12:02.879Z	<Austin Axworthy> we'll try reducing max mds to 1
2025-01-09T08:19:40.503Z	<Adam D> Try to disable all mds and start only one, when recovery(up:active) finishes then another one etc. I also had a [problem](https://ceph-storage.slack.com/archives/C04LVQMHM9B/p1728643765342399) with heartbeat and recovery process reset/block.
2025-01-09T09:16:19.096Z	<Dhairya Parmar> @Venky Shankar (re: the async read with filer for non-contiguous i/o) i've updated the tracker with my findings <https://tracker.ceph.com/issues/69315#note-12>. Do let me know if you're available post-standup to discuss a bit on this. (TL;DR - code looks fine, Filer and OC are treating bufferlist differently. Tweaking the test case worked.)
2025-01-09T13:45:13.938Z	<Md Mahamudur Rahaman Sajib> Hi Folks, I have a question regarding ceph_getdents. If I iterate on the dentries using ceph_getdents and delete some subdirectory/file on the fly(depending on some conditon). Will it affect the ceph_getdents function(may be skip some dentry)? Is it expected? ```while (true) { int len = ceph_getdents(mnt, dirp, (char *)dire, 512); if (len < 0) { cout << "failed to read directory" << endl; break; } if (len == 0) break; int nr = len / sizeof(struct dirent); for (int i = 0; i < nr; ++i) { std::string d_name = std::string(dire[i].d_name); if (d_name == "." \|\| d_name == "..") continue; if (check_failed(d_name)) { // delete this file or subdirectory } } }```
2025-01-09T13:46:51.039Z	<Patrick Donnelly> @Milind Changire which backport do you need me to look at?
2025-01-09T13:47:06.395Z	<Milind Changire> squid
2025-01-09T13:47:48.178Z	<Milind Changire> @Patrick Donnelly ^^
2025-01-09T13:48:13.666Z	<Milind Changire> fyi - <https://tracker.ceph.com/issues/68691>
2025-01-09T13:57:11.245Z	<Patrick Donnelly> there are conflicts, I'll check why
2025-01-09T13:57:16.354Z	<Patrick Donnelly> in the mean time: @Milind Changire <https://tracker.ceph.com/issues/68651#note-5>
2025-01-09T13:57:46.813Z	<Patrick Donnelly> @Venky Shankar do we need <https://tracker.ceph.com/issues/68653> ?
2025-01-09T13:58:06.470Z	<Patrick Donnelly> doesn't look like @Milind Changire has a backport yet and I'm questioning if it's worth the trouble
2025-01-09T13:58:27.091Z	<Patrick Donnelly> also quincy is technically EOL
2025-01-09T14:00:07.875Z	<Patrick Donnelly> well, milind says he has a backport for quincy: <https://tracker.ceph.com/issues/68652#note-3>
2025-01-09T14:00:13.163Z	<Patrick Donnelly> @Milind Changire please update the tickets
2025-01-09T14:01:25.447Z	<Venky Shankar> > do we need <https://tracker.ceph.com/issues/68653> ? not really I think. On Monday's CLT, it was kind of decided that there would be no next Q point-release.
2025-01-09T14:02:48.140Z	<Venky Shankar> > well, milind says he has a backport for quincy: <https://tracker.ceph.com/issues/68652#note-3> maybe that was coming for some downstream stuff, so that we can test and cherry-pick patches. @Milind Changire?
2025-01-09T16:19:29.315Z	<Patrick Donnelly> @Milind Changire @Venky Shankar FYI: <https://tracker.ceph.com/issues/68926>
2025-01-09T16:51:03.085Z	<Milind Changire> reef trackers status has been set to Resolved
2025-01-09T16:54:29.055Z	<Milind Changire> ticket has been updated to Resolved

ceph - cephfs - 2025-01-09

Any issue? please create an issue here and use the infra label.