ceph - cephfs - 2024-11-19

Timestamp (UTC)	Message
2024-11-19T05:30:53.286Z	<Venky Shankar> @Patrick Donnelly PTAL <https://tracker.ceph.com/issues/64677#note-4> (not urgent at all, so feel free to reschedule this nudge).
2024-11-19T13:22:38.031Z	<Patrick Donnelly> responded
2024-11-19T13:29:15.407Z	<Igor Golikov> Running 5 mins late for daily
2024-11-19T13:57:29.976Z	<Markuze> Hey, folks. Other than messenger tasks hogging the CPU i see these errors. Its the kernel client complaining that a CAP_O_GRANT was received for an inode it cant find. Has anyone seen something like this? this is a frequent occurrence. ```[Nov19 15:48] ceph: [740e6cbd-11f6-4357-89db-d4d4f82d4b61 21446]: from mds0, can't find ino fffffffffffffffe:100000176be op 0, seq 5 [ +1.307288] ceph: [740e6cbd-11f6-4357-89db-d4d4f82d4b61 21446]: from mds0, can't find ino fffffffffffffffe:100000176bf op 0, seq 5 [ +1.441070] ceph: [740e6cbd-11f6-4357-89db-d4d4f82d4b61 21446]: from mds0, can't find ino fffffffffffffffe:100000176c0 op 0, seq 5 [ +0.513919] ceph: [740e6cbd-11f6-4357-89db-d4d4f82d4b61 21446]: from mds0, can't find ino fffffffffffffffe:100000176c1 op 0, seq 5 [ +0.318858] ceph: [740e6cbd-11f6-4357-89db-d4d4f82d4b61 21446]: from mds0, can't find ino fffffffffffffffe:100000176c2 op 0, seq 5 [ +0.438481] ceph: [740e6cbd-11f6-4357-89db-d4d4f82d4b61 21446]: from mds0, can't find ino fffffffffffffffe:100000176c3 op 0, seq 5 [ +0.384798] ceph: [740e6cbd-11f6-4357-89db-d4d4f82d4b61 21446]: from mds0, can't find ino fffffffffffffffe:100000176c4 op 0, seq 5 [ +0.453058] ceph: [740e6cbd-11f6-4357-89db-d4d4f82d4b61 21446]: from mds0, can't find ino fffffffffffffffe:100000176c5 op 0, seq 5 [ +0.365346] ceph: [740e6cbd-11f6-4357-89db-d4d4f82d4b61 21446]: from mds0, can't find ino fffffffffffffffe:100000176c6 op 0, seq 5 [ +0.386730] ceph: [740e6cbd-11f6-4357-89db-d4d4f82d4b61 21446]: from mds0, can't find ino fffffffffffffffe:100000176c7 op 0, seq 5 [ +0.719176] ceph: [740e6cbd-11f6-4357-89db-d4d4f82d4b61 21446]: from mds0, can't find ino fffffffffffffffe:100000176c8 op 0, seq 5```
2024-11-19T15:12:49.755Z	<gregsfortytwo> What’s that -1 it’s outputting — is that the Filesystem ID or something?
2024-11-19T15:13:32.101Z	<gregsfortytwo> Looks like it’s counting up somehow which is odd
2024-11-19T15:13:47.905Z	<Markuze> Its SNAP = CEPH_NOSNAP
2024-11-19T15:15:06.726Z	<Markuze> It prints the inode.snap:inode, op, seq. OP is always CAP_OP_GRANT
2024-11-19T15:15:18.057Z	<Markuze> It prints the inode.snap:inode, op, seq. OP is always CAP_OP_GRANT = 0
2024-11-19T15:16:07.290Z	<gregsfortytwo> So it’s just counting up the inodes and the client doesn’t recognize a huge portion?
2024-11-19T15:16:37.975Z	<Markuze> so it seems.
2024-11-19T15:16:51.623Z	<gregsfortytwo> Is this a reconnect or something? I’d check what the MDS thinks is happening because this is obviously strange
2024-11-19T15:40:07.384Z	<Markuze> I'm not sure, why its happening, I'm investigating. I found these two issues so far. <https://tracker.ceph.com/issues/68980> <https://tracker.ceph.com/issues/68981> We are also failing all xfstest ceph/ tests, but I'm not sure if its a test issue or a ceph issue.
2024-11-19T15:57:29.699Z	<gregsfortytwo> this is a brand-new behavior, right? Do you have any patches under test, or new stuff merged to testing?
2024-11-19T15:58:38.878Z	<gregsfortytwo> I imagine the logging is part of the reason the tests are slow, if it’s counting up from 10000000000 to 10000017cc8 (that’s 97480 inode lines printed out!)
2024-11-19T16:30:03.306Z	<Markuze> Its not printing on every inode, but there is a bunch. I don't know when was the last stable commit ill try finding it. Ill check for linus because that's what we have for upcoming 9 and 10 downstream relases
2024-11-19T16:30:34.874Z	<gregsfortytwo> I mean this could be a server issue too, but this is very abrupt for me
2024-11-19T16:31:12.139Z	<Markuze> The test do eventually succeed.
2024-11-19T16:31:22.381Z	<Markuze> The tests do eventually succeed.
2024-11-19T16:33:03.276Z	<Markuze> I had Blustore overflow and warnings and one MDS started being behind on trimming. I don't know, it doesnt look like a CPU or Memory issue.
2024-11-19T17:40:11.722Z	<Markuze> @gregsfortytwo, running now for the `for-linus` branch, I still see CPU hogging warnings, but it seems to be benign, but I don't see any of the missing inode errors. I'll let it run it takes a while. There was a crust on a cryptfs test on the testing branch I want to see if that happens again.

ceph - cephfs - 2024-11-19

Any issue? please create an issue here and use the infra label.