2024-10-21T10:11:08.034Z | <Dhairya Parmar> @Venky Shankar There's an item in purgequeue enum `l_pq_first = 3500` , this commit <https://github.com/ceph/ceph/commit/d96d0b0aa0b81917a097956eb8e71ea1884f8a1c> added the "stats" to purgequeue but no mention of the reason behind `3500`, do you know why? |
2024-10-21T10:11:43.781Z | <Dhairya Parmar> There is no mention of this value anywhere in the source code |
2024-10-21T10:14:59.067Z | <Dhairya Parmar> or maybe there's no rationale behind this value selection 🤔 |
2024-10-21T10:22:14.453Z | <Venky Shankar> oh, I think those value are choosen a bit randomly with the only constraint being it needs to be distinct across ceph daemons. |
2024-10-21T10:23:04.075Z | <Venky Shankar> I don't recall there being a stratergy for this apart from choose a value that's higher than the max. |
2024-10-21T10:25:23.333Z | <Dhairya Parmar> the max? you mean `mds_max_purge_ops` ? |
2024-10-21T10:25:37.683Z | <Venky Shankar> nope |
2024-10-21T10:25:47.204Z | <Venky Shankar> the max value of the enum itself |
2024-10-21T10:26:15.359Z | <Dhairya Parmar> a |
2024-10-21T10:26:26.689Z | <Dhairya Parmar> ah |
2024-10-21T12:45:02.755Z | <Venky Shankar> Thx @Milind Changire for running standup. Bug triage has been moved to Wednesday for this week! |
2024-10-21T16:14:23.581Z | <erikth> I was looking to set up snapshots on a few ceph filesystems we have and I ran across this warning in the docs:
> Snapshots and multiple file systems don't interact well
Each of our cephfs has its own pool, but this bit is making me nervous:
> If each FS gets its own pool things probably work, but this isn't tested and may not be true.
Are there any more details available about this? Just trying to gauge how risky it is to actually set up snapshots on multiple filesystems. Thanks 🙏 |
2024-10-21T16:29:08.298Z | <Venky Shankar> I think the docs are outdated 😕 |
2024-10-21T16:30:06.377Z | <gregsfortytwo> It doesn’t work if you have your cephfs sharing pools (which is pretty hard to do these days, but was possible when that documentation was written). This is because snap deletion is done by putting the snapid/pool pair into the osdmap and letting the OSDs trim, but the snapid allocation isn’t coordinated across cephfs instances so if they are mixed, deleting a snapshot in CephFS A will delete data in CephFS B.
I don’t think there is any problem with multiple FSes that have independent pools, but I also don’t think it’s tested much so we still have that cautionary note |
2024-10-21T16:30:58.910Z | <Venky Shankar> yeh, the document is confusing then if it blatantly says "Snapshots and multiple file systems don't interact well". |
2024-10-21T16:34:00.581Z | <erikth> yeah, I got those quotes from this page <https://docs.ceph.com/en/squid/dev/cephfs-snapshots/> at the very bottom |
2024-10-21T16:38:54.817Z | <erikth> so it sounds like the risk of losing data using snapshots with independent pools is very low, but still not zero? I will try to do some testing of removing snapshots today and see how that goes |
2024-10-21T16:41:36.926Z | <gregsfortytwo> I wouldn’t expect any trouble, but I don’t think we test it in the lab and so wanted to make that clear since we were adding a section on how multi-fs and snapshots interact (I believe I wrote that text prior to marking CephFS as stable in Luminous) |
2024-10-21T16:42:21.585Z | <erikth> understood, thanks! |
2024-10-21T19:37:24.860Z | <Patrick Donnelly> Doc live: <https://docs.ceph.com/en/latest/dev/kclient/> |
2024-10-21T19:37:52.598Z | <Patrick Donnelly> note: the last step requires change in <https://github.com/ceph/teuthology/pull/2008> and <https://github.com/ceph/ceph/pull/60386> |