ceph - cephfs - 2024-07-25

Timestamp (UTC)Message
2024-07-25T16:50:01.768Z
<gregsfortytwo> CephFS uses the OSDs as a normal client so doesn’t have that. Kind of expertise. In general storage systems want to fill caches to their limits so I presume you have some limits configured too high for the amount of real memory, if swap is getting used up. Many groups recommend disabling swap for Ceph servers
2024-07-25T18:40:06.121Z
<Erich Weiler> Thanks!  Do you have any idea what limits I can tune to help this?  The caches are increasing at about 1GB/second on the OSD servers.
2024-07-25T18:41:04.593Z
<Erich Weiler> I’ve tried `vm.vfs_cache_pressure=1000` but it doesn’t seem to help much…
2024-07-25T18:49:40.118Z
<Erich Weiler> I’ve tried `vm.vfs_cache_pressure=1000` and `vm.swappiness=1` but it doesn’t seem to help much…
2024-07-25T18:57:53.712Z
<Patrick Donnelly> RFR: <https://github.com/ceph/ceph/pull/58861>
2024-07-25T20:24:08.773Z
<Mark Nelson (nhm)> @Patrick Donnelly Excellent news, the first 10 seconds of testing are looking really good!
2024-07-25T20:24:47.884Z
<Mark Nelson (nhm)> I'll do a full run to make sure we don't see anything unexpected, but I think you cracked it.
2024-07-25T20:46:54.043Z
<Mark Nelson (nhm)> hrm, I may have made IO stall.  Saw this on one of the MDS logs:
2024-07-25T20:46:58.692Z
<Mark Nelson (nhm)> ```2024-07-25T20:22:20.512+0000 7fdca33fe700  3 quiesce.mds.8 <quiesce_dispatch> error (-116) submitting q-db[v:(45:0) sets:0/0] from 4365```
2024-07-25T20:47:59.988Z
<Mark Nelson (nhm)> seeing those on a couple of other mdses as well.
2024-07-25T20:48:06.131Z
<Mark Nelson (nhm)> rank 0 shows:
2024-07-25T20:48:24.400Z
<Mark Nelson (nhm)> ```2024-07-25T20:19:20.975+0000 7f0be98f2700 -1 mds.pinger is_rank_lagging: rank=0 was never sent ping request.
2024-07-25T20:22:15.981+0000 7f0be98f2700 -1 mds.pinger is_rank_lagging: rank=5 was never sent ping request.
2024-07-25T20:22:20.982+0000 7f0be98f2700 -1 mds.pinger is_rank_lagging: rank=10 was never sent ping request.
2024-07-25T20:22:25.982+0000 7f0be98f2700 -1 mds.pinger is_rank_lagging: rank=15 was never sent ping request.
2024-07-25T20:22:30.982+0000 7f0be98f2700 -1 mds.pinger is_rank_lagging: rank=20 was never sent ping request.
2024-07-25T20:39:51.666+0000 7f0bee0fb700  0 --1- [v2:172.21.67.18:6860/445211041,v1:172.21.67.18:6861/445211041] >> v1:172.21.67.16:6855/1197928650 conn(0x564a1b11ac00 0x564a1bb64000 :6861 s=OPENED pgs=14 cs=1 l=0).fault initiating reconnect
2024-07-25T20:39:56.198+0000 7f0bef8fe700  0 --1- [v2:172.21.67.18:6860/445211041,v1:172.21.67.18:6861/445211041] >> v1:172.21.67.17:6859/3016582852 conn(0x564a1b858000 0x564a1b81b800 :-1 s=OPENED pgs=17 cs=1 l=0).fault initiating reconnect```

Any issue? please create an issue here and use the infra label.