2024-11-19T05:30:53.286Z | <Venky Shankar> @Patrick Donnelly PTAL <https://tracker.ceph.com/issues/64677#note-4> (not urgent at all, so feel free to reschedule this nudge). |
2024-11-19T13:22:38.031Z | <Patrick Donnelly> responded |
2024-11-19T13:29:15.407Z | <Igor Golikov> Running 5 mins late for daily |
2024-11-19T13:57:29.976Z | <Markuze> Hey, folks.
Other than messenger tasks hogging the CPU i see these errors.
Its the kernel client complaining that a CAP_O_GRANT was received for an inode it cant find.
Has anyone seen something like this? this is a frequent occurrence.
```[Nov19 15:48] ceph: [740e6cbd-11f6-4357-89db-d4d4f82d4b61 21446]: from mds0, can't find ino fffffffffffffffe:100000176be op 0, seq 5
[ +1.307288] ceph: [740e6cbd-11f6-4357-89db-d4d4f82d4b61 21446]: from mds0, can't find ino fffffffffffffffe:100000176bf op 0, seq 5
[ +1.441070] ceph: [740e6cbd-11f6-4357-89db-d4d4f82d4b61 21446]: from mds0, can't find ino fffffffffffffffe:100000176c0 op 0, seq 5
[ +0.513919] ceph: [740e6cbd-11f6-4357-89db-d4d4f82d4b61 21446]: from mds0, can't find ino fffffffffffffffe:100000176c1 op 0, seq 5
[ +0.318858] ceph: [740e6cbd-11f6-4357-89db-d4d4f82d4b61 21446]: from mds0, can't find ino fffffffffffffffe:100000176c2 op 0, seq 5
[ +0.438481] ceph: [740e6cbd-11f6-4357-89db-d4d4f82d4b61 21446]: from mds0, can't find ino fffffffffffffffe:100000176c3 op 0, seq 5
[ +0.384798] ceph: [740e6cbd-11f6-4357-89db-d4d4f82d4b61 21446]: from mds0, can't find ino fffffffffffffffe:100000176c4 op 0, seq 5
[ +0.453058] ceph: [740e6cbd-11f6-4357-89db-d4d4f82d4b61 21446]: from mds0, can't find ino fffffffffffffffe:100000176c5 op 0, seq 5
[ +0.365346] ceph: [740e6cbd-11f6-4357-89db-d4d4f82d4b61 21446]: from mds0, can't find ino fffffffffffffffe:100000176c6 op 0, seq 5
[ +0.386730] ceph: [740e6cbd-11f6-4357-89db-d4d4f82d4b61 21446]: from mds0, can't find ino fffffffffffffffe:100000176c7 op 0, seq 5
[ +0.719176] ceph: [740e6cbd-11f6-4357-89db-d4d4f82d4b61 21446]: from mds0, can't find ino fffffffffffffffe:100000176c8 op 0, seq 5```
|
2024-11-19T15:12:49.755Z | <gregsfortytwo> What’s that -1 it’s outputting — is that the Filesystem ID or something? |
2024-11-19T15:13:32.101Z | <gregsfortytwo> Looks like it’s counting up somehow which is odd |
2024-11-19T15:13:47.905Z | <Markuze> Its SNAP = CEPH_NOSNAP |
2024-11-19T15:15:06.726Z | <Markuze> It prints the inode.snap:inode, op, seq.
OP is always CAP_OP_GRANT |
2024-11-19T15:15:18.057Z | <Markuze> It prints the inode.snap:inode, op, seq.
OP is always CAP_OP_GRANT = 0 |
2024-11-19T15:16:07.290Z | <gregsfortytwo> So it’s just counting up the inodes and the client doesn’t recognize a huge portion? |
2024-11-19T15:16:37.975Z | <Markuze> so it seems. |
2024-11-19T15:16:51.623Z | <gregsfortytwo> Is this a reconnect or something? I’d check what the MDS thinks is happening because this is obviously strange |
2024-11-19T15:40:07.384Z | <Markuze> I'm not sure, why its happening, I'm investigating.
I found these two issues so far.
<https://tracker.ceph.com/issues/68980>
<https://tracker.ceph.com/issues/68981>
We are also failing all xfstest ceph/ tests, but I'm not sure if its a test issue or a ceph issue. |
2024-11-19T15:57:29.699Z | <gregsfortytwo> this is a brand-new behavior, right? Do you have any patches under test, or new stuff merged to testing? |
2024-11-19T15:58:38.878Z | <gregsfortytwo> I imagine the logging is part of the reason the tests are slow, if it’s counting up from 10000000000 to 10000017cc8 (that’s 97480 inode lines printed out!) |
2024-11-19T16:30:03.306Z | <Markuze> Its not printing on every inode, but there is a bunch.
I don't know when was the last stable commit ill try finding it. Ill check for linus because that's what we have for upcoming 9 and 10 downstream relases |
2024-11-19T16:30:34.874Z | <gregsfortytwo> I mean this could be a server issue too, but this is very abrupt for me |
2024-11-19T16:31:12.139Z | <Markuze> The test do eventually succeed. |
2024-11-19T16:31:22.381Z | <Markuze> The tests do eventually succeed. |
2024-11-19T16:33:03.276Z | <Markuze> I had Blustore overflow and warnings and one MDS started being behind on trimming. I don't know, it doesnt look like a CPU or Memory issue. |
2024-11-19T17:40:11.722Z | <Markuze> @gregsfortytwo, running now for the `for-linus` branch, I still see CPU hogging warnings, but it seems to be benign, but I don't see any of the missing inode errors.
I'll let it run it takes a while.
There was a crust on a cryptfs test on the testing branch I want to see if that happens again. |