2024-06-18T05:14:17.734Z | <Kotresh H R> checking |
2024-06-18T05:55:12.399Z | <Jos Collin> @Leonid Usov @Venky Shankar Could you please approve <https://github.com/ceph/ceph/pull/56701>. The run <https://pulpito.ceph.com/leonidus-2024-06-16_14:17:14-fs-wip-leonidus-testing-20240616.070940-reef-distro-default-smithi/> looks good. |
2024-06-18T08:53:13.855Z | <Rishabh Dave> @Kotresh H R any opinion? |
2024-06-18T10:12:35.311Z | <Rishabh Dave> @Venky Shankar There is only one review comment on `clone stats` PR that needs fixing, the one where we want destination path to point to subvol UUID dir instead of subvol base dir. I've fixed it here - <https://github.com/ceph/ceph/pull/54620/commits/b6ebc933c496108b8bc8825e2b75b5ab1c1fe4a2>. If possible, can you please take quick look and let me know if it looks okay?
Since acc to last Friday's conversation we look forward to final QA run and merge after this, if this commit looks okay, I'll proceed to do few runs with teuth to ensure that tests on the PR don't reveal more any race conditions anymore. |
2024-06-18T10:13:06.707Z | <Venky Shankar> I have to get some backport work done |
2024-06-18T10:13:16.474Z | <Venky Shankar> which is urgent, so the review would have to wait till tomorrow |
2024-06-18T10:13:39.839Z | <Venky Shankar> and the qa bits needs review (I haven't looked at those closely) |
2024-06-18T10:13:54.184Z | <Venky Shankar> let's not call it final QA runs yet š |
2024-06-18T10:16:43.850Z | <Rishabh Dave> i meant i can run only tests added on this PR few times (because they've failed in past due to race condition) so that we can move a step towards final QA run. |
2024-06-18T10:17:05.560Z | <Venky Shankar> kk, you can continue running tests |
2024-06-18T10:17:10.655Z | <Venky Shankar> no problem |
2024-06-18T10:17:20.056Z | <Rishabh Dave> if you can review just that commit (14 LoC), it would be great. i'll wait otherwise. no issues. |
2024-06-18T10:17:34.729Z | <Rishabh Dave> > you can continue running tests
yeah, i |
2024-06-18T10:17:42.904Z | <Venky Shankar> Better to wait -- I'll review it tomorrow in totallity. |
2024-06-18T10:17:53.822Z | <Rishabh Dave> > you can continue running tests
yeah, i'm unsure if that new commits is how we want it. |
2024-06-18T10:17:55.908Z | <Rishabh Dave> cool |
2024-06-18T10:18:02.916Z | <Rishabh Dave> i'll wait. |
2024-06-18T10:44:07.394Z | <Kotresh H R> In the code, does this refer to the subvolume being cloned ? |
2024-06-18T10:44:21.431Z | <Rishabh Dave> yes |
2024-06-18T10:45:08.236Z | <Rishabh Dave> should we be concerned for src subvol instead of dst subvol? |
2024-06-18T10:45:24.965Z | <Kotresh H R> Then I don't get Venky's comment in the PR ? How can the clone be removed with --retain-snapshots option. |
2024-06-18T10:46:14.091Z | <Kotresh H R> But Venky's point is still valid in the sense, we should be relying on the rstats of the uuid directory rather than the subvolume base. |
2024-06-18T10:46:27.876Z | <Kotresh H R> Subvolume base does contain other metadata files too. |
2024-06-18T10:47:07.662Z | <Kotresh H R> src subvolume, aren't you fetching rstats from the snapshot ? |
2024-06-18T10:47:19.422Z | <Rishabh Dave> i am. |
2024-06-18T10:47:57.354Z | <Venky Shankar> rstat() of base would report stats aggregated from everything under it, isn't it? That would include all snaps.. |
2024-06-18T10:47:59.131Z | <Kotresh H R> ok, In src too, it's better to fetch rstats of subvolume root |
2024-06-18T10:48:33.120Z | <Kotresh H R> > rstat() of base would report stats aggregated from everything under it, isn't it? That would include all snaps..
Yes, right. So we should always rely on subvolume root |
2024-06-18T10:48:38.874Z | <Kotresh H R> in all the places |
2024-06-18T10:48:40.910Z | <Rishabh Dave> @Kotresh H R subvolume root meaning the UUID dir or base dir? |
2024-06-18T10:48:53.942Z | <Kotresh H R> subvolume root = UUID |
2024-06-18T10:49:09.512Z | <Rishabh Dave> got it, was confirming. |
2024-06-18T10:49:54.995Z | <Kotresh H R> I am not recollecting exactly, the snapshot path varies between subvolume versions after retian snapshot feature is introduced |
2024-06-18T10:50:23.172Z | <Kotresh H R> But I think the get snapshot path interface should take care of it |
2024-06-18T10:50:35.367Z | <Rishabh Dave> on dst subvol side, fetching rstats on subvol base dir is fine because it is still under creation, right?
although i agree it's ideal and better to fetch from dst subvol root rather than dst subvol base. |
2024-06-18T10:50:43.800Z | <Rishabh Dave> > But I think the get snapshot path interface should take care of it |
2024-06-18T10:51:29.021Z | <Kotresh H R> > on dst subvol side, fetching rstats on subvol base dir is fine because it is still under creation, right?
For progress stats to be correct, both src and dst should rely on subvolume_root |
2024-06-18T10:51:35.416Z | <Rishabh Dave> > But I think the get snapshot path interface should take care of it
yes, we are using that of src subvol side. so it's taken care of. |
2024-06-18T10:52:10.567Z | <Rishabh Dave> right. |
2024-06-18T10:52:39.289Z | <Rishabh Dave> so on src side, we already do that and i've added this commit to do that on dst subvol side as well - <https://github.com/ceph/ceph/pull/54620/commits/b6ebc933c496108b8bc8825e2b75b5ab1c1fe4a2> |
2024-06-18T10:55:54.034Z | <Kotresh H R> ok |
2024-06-18T10:58:50.533Z | <Rishabh Dave> @Kotresh H R was there some reason to not allow `get_path` op while subvol is not in `STATE_COMPLETE` ? |
2024-06-18T10:58:51.025Z | <Rishabh Dave> i |
2024-06-18T10:59:38.334Z | <Rishabh Dave> @Kotresh H R was there some reason to not allow `get_path` op while subvol is not in `STATE_COMPLETE` ?
i've allowed it now since we need do `get_path` op while dst subvol is being created during cloning. [https://github.com/ceph/ceph/pull/54620/commits/b6ebc933c496108b8bc8825e2b75b5ab1c1fe4a2#diff-2afc20b5874dc7ad850a970[ā¦]d6ab0a44e2140f8491feaf44R220](https://github.com/ceph/ceph/pull/54620/commits/b6ebc933c496108b8bc8825e2b75b5ab1c1fe4a2#diff-2afc20b5874dc7ad850a970bcba2be7246f04c94d6ab0a44e2140f8491feaf44R220) |
2024-06-18T10:59:46.948Z | <Rishabh Dave> @Venky Shankar In context to this comment <https://github.com/ceph/ceph/pull/54620#discussion_r1637695939>, `dst_path` should be path to subvolume UUID, so that `rsize` doesn't include snapshots of the subvolume? |
2024-06-18T11:08:16.604Z | <Kotresh H R> get_path didn't make sense on a subvolume which is not complete yet |
2024-06-18T11:08:46.038Z | <Kotresh H R> But isn't that for the external subvolume op |
2024-06-18T11:09:04.834Z | <Kotresh H R> Internally you can directly call the subvolume funcations right ? |
2024-06-18T11:09:46.231Z | <Kotresh H R> Internally you can directly call the subvolume functions right ? |
2024-06-18T11:22:50.054Z | <Rishabh Dave> internally, yes, but in this case i need to open subvol but therefore need to use right SubvolOp |
2024-06-18T11:26:52.404Z | <Kotresh H R> ok |
2024-06-18T13:30:47.155Z | <Dhairya Parmar> @Leonid Usov I was taking a look at this tracker <https://tracker.ceph.com/issues/66107> and wanted to understand what "roots" are in this log `sending ack q-map[v:(19:133) roots:1/0] to the leader 4294` line. When reading the `QuiesceMap` code it looks to me that the word "root" means nodes of the tree i.e. every node in the quiesce map is a representation of the quiesce state? |
2024-06-18T13:44:56.385Z | <Leonid Usov> roots here are āquiesce rootsā. Those are paths into the FS that need to be quiesced, including everything below them - so, they are the root of the quiesced tree |
2024-06-18T13:45:32.353Z | <Leonid Usov> the notation above means that there was an acknowledgement carrying 1 active root and 0 inactive roots |
2024-06-18T13:46:44.721Z | <Leonid Usov> since its a diff_ack, we can conclude that the active root thatās part of this ack is in a state that differs from what the quiesce db assumes at version 19:133 |
2024-06-18T13:47:33.799Z | <Leonid Usov> usually, this means that the root got QUIESCED by this agent, while the DB still believes that the root is QUIESCING |
2024-06-18T13:51:58.221Z | <Leonid Usov> however, it could (theoretically) be the opposite: an agent may report a root as QUIESCING when the DB believes that itās QUIESCED.
This transition is valid DB-wise, though it is not supported on the agent currently. |
2024-06-18T13:52:58.035Z | <Leonid Usov> From the quiescedb.h
```template <class CharT, class Traits>
static std::basic_ostream<CharT, Traits>&
operator<<(std::basic_ostream<CharT, Traits>& os, const QuiesceMap& map)
{
size_t active = 0, inactive = 0;
for (auto&& [_, r] : map.roots) {
if (r.state < QS__TERMINAL) {
++active;
} else {
++inactive;
}
}
return os << "q-map[v:" << map.db_version << " roots:" << active << "/" << inactive << "]";
}``` |
2024-06-18T14:31:27.649Z | <Dhairya Parmar> so `QuiesceMap` is just the representation of the root states while `QuiesceSet` contains roots that need to be quiseced |
2024-06-18T14:32:26.370Z | <Leonid Usov> correct. In the QuiesceSet, the roots are referred to as āmembersā. For a given set, its members are necessarily unique roots, but the same root may be a member of more than one set. |
2024-06-18T14:33:21.119Z | <Leonid Usov> when the quiesce db generates a quiesce map, it scans all active quiesce sets and aggregates all set members into a unique list of quiesce roots |
2024-06-18T14:34:16.475Z | <Dhairya Parmar> and where exactly does that happen? |
2024-06-18T14:34:52.128Z | <Leonid Usov> https://files.slack.com/files-pri/T1HG3J90S-F078QLCJMK6/download/untitled.cpp |
2024-06-18T14:36:25.773Z | <Leonid Usov> that code got updated recently in <https://github.com/ceph/ceph/pull/57980>, pending review. I found an error with how the TTL is aggregated and fixed it |
2024-06-18T14:47:23.111Z | <Dhairya Parmar> ``` // for a quiesce map, we want to report active roots as either QUIESCING or RELEASING
// this is to make sure that clients always have a reason to report back and confirm
// the quiesced state.
auto requested = set.get_requested_member_state();```
the `requested` here means the requested state of all the roots inside the set `set` right? |
2024-06-18T14:48:02.196Z | <Leonid Usov> yes |
2024-06-18T14:48:58.136Z | <Dhairya Parmar> okay, the name `get_requested_member_state` confused me a bit, `get_requested_members_state` would've been easier |
2024-06-18T14:49:23.838Z | <Dhairya Parmar> at first glance it looked like it was trying to fetch state of each member in the set |
2024-06-18T14:49:36.277Z | <Leonid Usov> i donāt think this is a big enough change, so itās within the personal taste margin. I understand this as the state for a member |
2024-06-18T14:49:46.567Z | <Dhairya Parmar> okay |
2024-06-18T14:50:03.993Z | <Leonid Usov> on the other hand, you are free to refactor it |
2024-06-18T14:50:40.589Z | <Dhairya Parmar> š
|
2024-06-18T14:51:04.543Z | <Dhairya Parmar> that would be the last thing to do i guess |
2024-06-18T14:51:13.998Z | <Leonid Usov> this loop is per member, so i think it has some reasoning |
2024-06-18T14:51:50.490Z | <Leonid Usov> there is another function in the set:
``` /// @brief The effective state of a member is just a max of its parent set state and its own state
/// The exception is when the set is releasing: we want to consider any ack from peers
/// that confirms quiesced state of the member to be treated as RELEASED.
/// @param member_state the reported state of the member
/// @return the effective member state
QuiesceState get_effective_member_state(QuiesceState reported_state) const``` |
2024-06-18T14:53:10.798Z | <Leonid Usov> these two functions are names similarly. the difference is that the `effective` version takes the reported member state and combines it with the set state, whereas the `requested` version doesnāt care about the reported member state |
2024-06-18T14:53:15.734Z | <Leonid Usov> these two functions are named similarly. the difference is that the `effective` version takes the reported member state and combines it with the set state, whereas the `requested` version doesnāt care about the reported member state |
2024-06-18T14:54:41.049Z | <Dhairya Parmar> oh okay |
2024-06-18T14:56:22.356Z | <Dhairya Parmar> `get_requested_member_state` makes sense now, the loop just means we iterate through the entire set and get the requested state but since it is `set.get_requested_member_state()` i.e. doesn't that mean all the `requested` values would be the same for all members of that particular set? |
2024-06-18T14:59:01.123Z | <Leonid Usov> thatās correct. You could take that call out of the inner loop |
2024-06-18T14:59:39.974Z | <Leonid Usov> though Iām sure the compiler is doing that for us, as that method is marked `const` |
2024-06-18T15:01:46.866Z | <Leonid Usov> well, not so sure. Probably itād be best to move that call before the loop over the members |
2024-06-18T15:06:02.886Z | <Dhairya Parmar> agreed, although from the readability perspective, since this func is part of the `QuiesceSet` and not the `Members members` (the word "member" in the func name made me assert this initially), and since this state applies to all the members, i'd think of rewording this as `get_requested_members_state` and put the call out of the inner loop. this would be like this now
```void QuiesceDbManager::calculate_quiesce_map(QuiesceMap &map)
{
map.roots.clear();
map.db_version = db.version();
auto db_age = db.get_age();
for(auto & [set_id, set]: db.sets) {
if (set.is_active()) {
// we only report active sets;
auto requested = set.get_requested_members_state();
for(auto & [root, member]: set.members) {
if (member.excluded) {
continue;
}
// for a quiesce map, we want to report active roots as either QUIESCING or RELEASING
// this is to make sure that clients always have a reason to report back and confirm
// the quiesced state.
auto ttl = get_root_ttl(set, member, db_age);
auto root_it = map.roots.try_emplace(root, QuiesceMap::RootInfo { requested, ttl }).first;
// the min below resolves conditions when members representing the same root have different state/ttl
// e.g. if at least one member is QUIESCING then the root should be QUIESCING
root_it->second.state = std::min(root_it->second.state, requested);
root_it->second.ttl = std::min(root_it->second.ttl, ttl);
}
}
}
}``` |
2024-06-18T15:06:14.115Z | <Dhairya Parmar> this should be hard to read, just a sec |
2024-06-18T15:06:38.082Z | <Dhairya Parmar> here <https://termbin.com/dq50> |
2024-06-18T15:07:30.300Z | <Leonid Usov> iām reluctant about the renaming, see above, but the move of the call outside the inner loop makes sense |
2024-06-18T15:07:36.796Z | <Dhairya Parmar> okay |
2024-06-18T15:08:12.770Z | <Dhairya Parmar> should this be my first quiesce db pr š
? a minor refactor? |
2024-06-18T15:08:20.003Z | <Leonid Usov> if you go for the renaming, then the second function should be renamed, too |
2024-06-18T15:08:33.473Z | <Dhairya Parmar> i'll leave that renaming for now |
2024-06-18T15:09:17.530Z | <Leonid Usov> > should this be my first quiesce db pr š
? a minor refactor?
I would approve such a PR if you are ready to go for it. Though please note that with this feature being part of squid, you may need to backport all refactors |
2024-06-18T15:09:35.767Z | <Dhairya Parmar> whoops |
2024-06-18T15:10:01.918Z | <Leonid Usov> well, thatās what I had to do with all my fixes |
2024-06-18T15:10:40.810Z | <Dhairya Parmar> then i think it is better to create a tracker for refactors(if they may come up later as i read more code) and do it all at once? |
2024-06-18T15:11:04.105Z | <Leonid Usov> yes, that would be my approach for such refactors |
2024-06-18T15:11:19.025Z | <Dhairya Parmar> alright |
2024-06-18T15:11:23.529Z | <Dhairya Parmar> thanks š |