2024-06-12T01:56:15.447Z | <Xiubo Li> @Patrick Donnelly BTW, do you know which mds option could make the dir fragments to be balanced and migrated to other MDSs quickly ? |
2024-06-12T02:02:03.313Z | <Xiubo Li> @Patrick Donnelly @Venky Shankar BTW, do you know which mds option could make the dir fragments to be balanced and migrated to other MDSs quickly ? |
2024-06-12T04:06:26.735Z | <Jos Collin> this doesn't seem an issue with mirroring
```<error>
<unique>0x2bb</unique>
<tid>1</tid>
<kind>Leak_StillReachable</kind>
<xwhat>
<text>36 bytes in 1 blocks are still reachable in loss record 3 of 11</text>
<leakedbytes>36</leakedbytes>
<leakedblocks>1</leakedblocks>
</xwhat>
<stack>
<frame>
<ip>0x484480F</ip>
<obj>/usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so</obj>
<fn>malloc</fn>
<dir>/builddir/build/BUILD/valgrind-3.22.0/coregrind/m_replacemalloc</dir>
<file>vg_replace_malloc.c</file>
<line>442</line>
</frame>
<frame>
<ip>0x402382F</ip>
<obj>/usr/lib64/ld-linux-x86-64.so.2</obj>
<fn>malloc</fn>
<dir>/usr/src/debug/glibc-2.34-105.el9.x86_64/string/../include</dir>
<file>rtld-malloc.h</file>
<line>56</line>
</frame>
<frame>
<ip>0x402382F</ip>
<obj>/usr/lib64/ld-linux-x86-64.so.2</obj>
<fn>strdup</fn>
<dir>/usr/src/debug/glibc-2.34-105.el9.x86_64/string</dir>
<file>strdup.c</file>
<line>42</line>
</frame>
<frame>
<ip>0x4014677</ip>
<obj>/usr/lib64/ld-linux-x86-64.so.2</obj>
<fn>_dl_load_cache_lookup</fn>
<dir>/usr/src/debug/glibc-2.34-105.el9.x86_64/elf</dir>
<file>dl-cache.c</file>
<line>517</line>
</frame>
<frame>
<ip>0x40089B7</ip>
<obj>/usr/lib64/ld-linux-x86-64.so.2</obj>
<fn>_dl_map_object</fn>
<dir>/usr/src/debug/glibc-2.34-105.el9.x86_64/elf</dir>
<file>dl-load.c</file>
<line>2152</line>
</frame>
<frame>
<ip>0x400C3B9</ip>
<obj>/usr/lib64/ld-linux-x86-64.so.2</obj>
<fn>dl_open_worker_begin</fn>
<dir>/usr/src/debug/glibc-2.34-105.el9.x86_64/elf</dir>
<file>dl-open.c</file>
<line>577</line>
</frame>
<frame>
<ip>0x5B8F147</ip>
<obj>/usr/lib64/libc.so.6</obj>
<fn>_dl_catch_exception</fn>
<dir>/usr/src/debug/glibc-2.34-105.el9.x86_64/elf</dir>
<file>dl-error-skeleton.c</file>
<line>208</line>
</frame>
<frame>
<ip>0x400BAF9</ip>
<obj>/usr/lib64/ld-linux-x86-64.so.2</obj>
<fn>dl_open_worker</fn>
<dir>/usr/src/debug/glibc-2.34-105.el9.x86_64/elf</dir>
<file>dl-open.c</file>
<line>796</line>
</frame>
<frame>
<ip>0x5B8F147</ip>
<obj>/usr/lib64/libc.so.6</obj>
<fn>_dl_catch_exception</fn>
<dir>/usr/src/debug/glibc-2.34-105.el9.x86_64/elf</dir>
<file>dl-error-skeleton.c</file>
<line>208</line>
</frame>
<frame>
<ip>0x400BF5E</ip>
<obj>/usr/lib64/ld-linux-x86-64.so.2</obj>
<fn>_dl_open</fn>
<dir>/usr/src/debug/glibc-2.34-105.el9.x86_64/elf</dir>
<file>dl-open.c</file>
<line>898</line>
</frame>
<frame>
<ip>0x5ABECBB</ip>
<obj>/usr/lib64/libc.so.6</obj>
<fn>dlopen_doit</fn>
<dir>/usr/src/debug/glibc-2.34-105.el9.x86_64/dlfcn</dir>
<file>dlopen.c</file>
<line>56</line>
</frame>
<frame>
<ip>0x5B8F147</ip>
<obj>/usr/lib64/libc.so.6</obj>
<fn>_dl_catch_exception</fn>
<dir>/usr/src/debug/glibc-2.34-105.el9.x86_64/elf</dir>
<file>dl-error-skeleton.c</file>
<line>208</line>
</frame>
<frame>
<ip>0x5B8F212</ip>
<obj>/usr/lib64/libc.so.6</obj>
<fn>_dl_catch_error</fn>
<dir>/usr/src/debug/glibc-2.34-105.el9.x86_64/elf</dir>
<file>dl-error-skeleton.c</file>
<line>227</line>
</frame>
<frame>
<ip>0x5ABE78D</ip>
<obj>/usr/lib64/libc.so.6</obj>
<fn>_dlerror_run</fn>
<dir>/usr/src/debug/glibc-2.34-105.el9.x86_64/dlfcn</dir>
<file>dlerror.c</file>
<line>138</line>
</frame>
<frame>
<ip>0x5ABED70</ip>
<obj>/usr/lib64/libc.so.6</obj>
<fn>dlopen_implementation</fn>
<dir>/usr/src/debug/glibc-2.34-105.el9.x86_64/dlfcn</dir>
<file>dlopen.c</file>
<line>71</line>
</frame>
<frame>
<ip>0x5ABED70</ip>
<obj>/usr/lib64/libc.so.6</obj>
<fn>dlopen@@GLIBC_2.34</fn>
<dir>/usr/src/debug/glibc-2.34-105.el9.x86_64/dlfcn</dir>
<file>dlopen.c</file>
<line>81</line>
</frame>
<frame>
<ip>0x5039C0F</ip>
<obj>/usr/lib64/ceph/libceph-common.so.2</obj>
<fn>_sub_I_65535_0.0</fn>
</frame>
<frame>
<ip>0x400507D</ip>
<obj>/usr/lib64/ld-linux-x86-64.so.2</obj>
<fn>call_init</fn>
<dir>/usr/src/debug/glibc-2.34-105.el9.x86_64/elf</dir>
<file>dl-init.c</file>
<line>70</line>
</frame>
<frame>
<ip>0x400507D</ip>
<obj>/usr/lib64/ld-linux-x86-64.so.2</obj>
<fn>call_init</fn>
<dir>/usr/src/debug/glibc-2.34-105.el9.x86_64/elf</dir>
<file>dl-init.c</file>
<line>26</line>
</frame>
<frame>
<ip>0x400516B</ip>
<obj>/usr/lib64/ld-linux-x86-64.so.2</obj>
<fn>_dl_init</fn>
<dir>/usr/src/debug/glibc-2.34-105.el9.x86_64/elf</dir>
<file>dl-init.c</file>
<line>117</line>
</frame>
<frame>
<ip>0x401CC29</ip>
<obj>/usr/lib64/ld-linux-x86-64.so.2</obj>
</frame>
<frame>
<ip>0x4</ip>
</frame>
<frame>
<ip>0x1FFF000B82</ip>
</frame>
<frame>
<ip>0x1FFF000B90</ip>
</frame>
<frame>
<ip>0x1FFF000B9A</ip>
</frame>
<frame>
<ip>0x1FFF000B9F</ip>
</frame>
<frame>
<ip>0x1FFF000BA4</ip>
</frame>
</stack>
</error>
</valgrindoutput>``` |
2024-06-12T04:06:48.078Z | <Jos Collin> @Venky Shankar |
2024-06-12T05:37:22.900Z | <Venky Shankar> reaplication factor? |
2024-06-12T05:37:33.016Z | <Venky Shankar> `mds_bal_replicate_threshold` |
2024-06-12T05:37:57.327Z | <Venky Shankar> lowering this causes the subtree to be much frequently replicated |
2024-06-12T05:38:27.735Z | <Xiubo Li> Okay, I will try it. BTW, is any load needed to trigger it with this ? |
2024-06-12T05:39:50.416Z | <Venky Shankar> nothing specific I think, but some workunit from fs:workload would suffice |
2024-06-12T05:40:35.762Z | <Xiubo Li> Sure, let me have a try. thanks very much @Venky Shankar |
2024-06-12T05:58:56.665Z | <Rishabh Dave> @Venky Shankar In context to <https://github.com/ceph/ceph/pull/54620#discussion_r1634309624>, `get_ceph_shell_stdout` is needed for tests in this PR. Can we keep it? There are similar methods but running Ceph commands (`raw_cluster_cmd()`, `get_ceph_cmd_stdout()`) |
2024-06-12T07:00:04.269Z | <Rishabh Dave> 2nd question -
I am fine with current changes you quoted here - <https://github.com/ceph/ceph/pull/54620#discussion_r1631026357>. Average of percentage and percentage of average is same IMO. Let me know if you are not fine with the changes here and I need to replace it. |
2024-06-12T07:00:57.926Z | <Venky Shankar> Im fine with that too |
2024-06-12T07:01:04.069Z | <Venky Shankar> but just comment it out |
2024-06-12T07:01:25.964Z | <Venky Shankar> the more pressing point is the condition for >1.0 check |
2024-06-12T07:01:53.362Z | <Rishabh Dave> > but just comment it out
okay, will get it done. |
2024-06-12T07:03:29.900Z | <Rishabh Dave> > the more pressing point is the condition for >1.0 check
i was about to ask about that too. i think you are referring to this thread, right? <https://github.com/ceph/ceph/pull/54620#discussion_r1631000670> |
2024-06-12T07:04:39.910Z | <Rishabh Dave> it's just defensive programming. |
2024-06-12T07:04:59.272Z | <Venky Shankar> that's done when there is a unknown bug |
2024-06-12T07:06:27.349Z | <Rishabh Dave> okay, i was just being extra defensive but i've decided to remove it since it causes confusion to the reader. i'll push the fix in some time. |
2024-06-12T07:10:39.622Z | <Venky Shankar> sure |
2024-06-12T07:11:46.212Z | <Rishabh Dave> In context to <https://github.com/ceph/ceph/pull/54620#discussion_r1634309624>, `get_ceph_shell_stdout()` is very useful for >1000 LoC tests that have been added in this PR. Can we keep it? There are similar methods for running Ceph commands (`raw_cluster_cmd()`, `get_ceph_cmd_stdout()`) |
2024-06-12T07:12:07.145Z | <Venky Shankar> I'd expect this reply in the PR please |
2024-06-12T07:12:22.241Z | <Venky Shankar> I think we already spoke about this? |
2024-06-12T07:12:31.119Z | <Venky Shankar> its a straightforward change |
2024-06-12T07:12:51.193Z | <Venky Shankar> but the only reason I'm not inclined in including in this pr is the code churn |
2024-06-12T07:16:02.063Z | <Rishabh Dave> > is the code churn
I agree which is why I am using this method only in new tests that are being added in the PR and not anywhere else. |
2024-06-12T07:16:38.112Z | <Rishabh Dave> no intention to refactor any existing QA code with it |
2024-06-12T07:17:18.335Z | <Venky Shankar> I'll recheck when I review the change again. |
2024-06-12T07:17:34.517Z | <Venky Shankar> Seems like something we can keep in this change then. |
2024-06-12T07:17:46.563Z | <Venky Shankar> Let's focus on other changes/comments first. |
2024-06-12T07:18:10.120Z | <Rishabh Dave> yes, getting rest of changes incorporated |
2024-06-12T07:22:20.265Z | <Rishabh Dave> thanks @Venky Shankar! |
2024-06-12T09:17:52.681Z | <Neeraj Pratap Singh> @Venky Shankar @Rishabh Dave if you are collecting PRs for teuthology testing, <https://github.com/ceph/ceph/pull/49974> is ready for testing, pls include it. |
2024-06-12T14:34:40.206Z | <Jos Collin> @Venky Shankar Please approve <https://github.com/ceph/ceph/pull/56700>. |