2024-06-14T09:54:09.142Z | <Jos Collin> @Venky Shankar Could you please approve these: <https://github.com/ceph/ceph/pull/57762>, <https://github.com/ceph/ceph/pull/57760>. No errors found in squid run: <https://tracker.ceph.com/issues/66423> |
2024-06-14T10:01:29.087Z | <Rishabh Dave> @Venky Shankar <https://github.com/ceph/ceph/pull/54620#pullrequestreview-2117728922> I agree with the problem you described in this review comment but solution you proposed is unclear to me. The "simplistic fix" you've described has already been implemented. |
2024-06-14T10:02:44.245Z | <Rishabh Dave> Copying "sophisticated fix" part here -
> For a more sophisticated fix, `_get_info_for_all_clones` could include the clone status and then here we select clones (and aggregate sizes) based on the status.
Clone status? How would we get current number of cloner threads allowed through clone status? |
2024-06-14T10:03:17.146Z | <Venky Shankar> no need to get the number of threads |
2024-06-14T10:03:25.442Z | <Venky Shankar> with each clone entry get its clone status |
2024-06-14T10:03:38.095Z | <Venky Shankar> it would in either in-progress or pending or canceled |
2024-06-14T10:03:48.413Z | <Venky Shankar> then whatever is in-progress aggregate size |
2024-06-14T10:04:05.240Z | <Rishabh Dave> ah, okay. count it right then and there. |
2024-06-14T10:04:17.202Z | <Venky Shankar> then -- for the rest aggregate the size + (aggregated size for in-progress) |
2024-06-14T10:04:23.712Z | <Venky Shankar> yeh |
2024-06-14T10:04:35.917Z | <Venky Shankar> no need to rely on max_concurrent_clones |
2024-06-14T10:04:57.471Z | <Rishabh Dave> got git |
2024-06-14T10:04:59.559Z | <Rishabh Dave> got it* |
2024-06-14T10:09:15.827Z | <Rishabh Dave> we'll be adding the code to make progress reporter thread wait for new clone jobs; see <https://github.com/ceph/ceph/pull/54620>.
in the same way, can we implement this on a separate PR? |
2024-06-14T10:09:40.474Z | <Venky Shankar> yeh, I'm fine with that |
2024-06-14T10:09:46.916Z | <Venky Shankar> let's do the simplistic fix for now |
2024-06-14T10:09:51.458Z | <Rishabh Dave> cool. |
2024-06-14T10:12:32.916Z | <Rishabh Dave> the current code we on the PR for "simplistic fix", is it okay in your opinion? or does it need some improvements? |
2024-06-14T10:14:24.944Z | <Venky Shankar> i'll review once more |
2024-06-14T10:14:36.272Z | <Venky Shankar> but its almost there 🙂 |
2024-06-14T10:14:42.995Z | <Rishabh Dave> okay |
2024-06-14T10:15:11.112Z | <Rishabh Dave> i am making rest of requested changes and testing with vstart_runner in the mean time |
2024-06-14T11:09:28.070Z | <Jos Collin> @Venky Shankar Does this need rados suite run also? <https://github.com/ceph/ceph/pull/57840> |
2024-06-14T11:10:01.435Z | <Venky Shankar> No |
2024-06-14T11:12:12.486Z | <Venky Shankar> will do |
2024-06-14T11:16:10.823Z | <Jos Collin> okay |
2024-06-14T11:17:44.665Z | <Jos Collin> @Rishabh Dave You have the ignorelist `MDS_CACHE_OVERSIZED`, but still this test fails? <https://pulpito.ceph.com/leonidus-2024-06-12_09:41:32-fs-wip-lusov-testing-20240611.123850-squid-distro-default-smithi/7751944/> |
2024-06-14T12:10:29.904Z | <Rishabh Dave> tests passed locally, i've pushed to this PR. PTAL. |
2024-06-14T12:10:53.853Z | <Rishabh Dave> i'll go through old review commits and check if anything is left. |
2024-06-14T12:15:51.240Z | <Rishabh Dave> added where? |
2024-06-14T12:17:53.852Z | <Rishabh Dave> are you talking about this? |
2024-06-14T12:17:55.632Z | <Rishabh Dave> <https://github.com/ceph/ceph/pull/57840/files#diff-0ec2f98005b3a59e058ad5748d70a283d4cd8258de53e445e6815b4eca599339R8> |
2024-06-14T12:18:59.700Z | <Rishabh Dave> it's added to ignorelist for `fs/functional/tasks/admin.yaml` |
2024-06-14T12:20:11.587Z | <Rishabh Dave> which is unrelated to the failing job. it's not `fs:workload`, not `fs:functional`. |
2024-06-14T22:38:26.738Z | <Bailey Allison> hey everyone, currently waiting for account approval on bug tracker but currently running into this <https://tracker.ceph.com/issues/64852>, just looking at the nodes to confirm issue is currently on kernel side ? would switching to fuse from kernel provide any benefit for this ? I am seeing the same logs as this too |