ceph - cephfs - 2024-09-10

Timestamp (UTC)	Message
2024-09-10T04:32:28.363Z	<jcollin> @Rishabh Dave for your next QA <https://github.com/ceph/ceph/pull/59566>
2024-09-10T04:54:12.238Z	<Venky Shankar> @Rishabh Dave <https://tracker.ceph.com/issues/63154> -- Its good to clear the backport field to avoid any confusion when marking a tracker as resolved when it does not require a backport.
2024-09-10T06:32:20.692Z	<Rishabh Dave> Thanks for the ping.
2024-09-10T06:56:07.956Z	<Rishabh Dave> Jos, please add needs-qa label to PR in future, this makes it possbile to spot such PRs when going through CephFS PR list. For now, Greg has added it.
2024-09-10T06:56:33.100Z	<Rishabh Dave> Right, I'll keep that in mind.
2024-09-10T07:25:43.718Z	<Venky Shankar> BTW, any reason its wasn't backported to reef?
2024-09-10T07:26:11.793Z	<Venky Shankar> I ask since `fs swap` was backported to reef, so it seems like we should also port this change to reef.
2024-09-10T07:27:12.319Z	<Rishabh Dave> I am also curious about this. I don't remember conversation we had on this one. I'll find it out and reply.
2024-09-10T07:28:00.338Z	<Venky Shankar> Its good to record any descisions for not-to-backport before closing the tracker, otherwise, we'll always keep of guessing later.
2024-09-10T07:36:49.900Z	<Rishabh Dave> @Venky Shankar Copying your comment from <https://tracker.ceph.com/issues/67493#note-11> for reference - > Rishabh, this batch has just cephfs-mirror PRs and I would want to rerun fs:mirror and mirror-ha suites. Please do the needful. Approx. how many jobs should I launch from `fs:mirror` and `fs:mirror-ha`? And should a full fs suite QA run be re-launched or be left in favour of last one?
2024-09-10T07:37:42.397Z	<Venky Shankar> There are just cephfs-mirror PRs right
2024-09-10T07:37:48.381Z	<Venky Shankar> so fs:mirror and the HA suite wold suffice
2024-09-10T07:38:02.028Z	<Venky Shankar> just use `-s fs:mirror` (and one for ha)
2024-09-10T07:39:08.721Z	<Rishabh Dave> okay. so all jobs launched by `-s fs:mirror` and `-s fs:mirror-ha` ...
2024-09-10T07:39:18.856Z	<Venky Shankar> yes
2024-09-10T07:39:22.149Z	<Rishabh Dave> got it.
2024-09-10T07:39:48.070Z	<Rishabh Dave> > There are just cephfs-mirror PRs right there's one PR with minor valgrind change - <https://github.com/ceph/ceph/pull/59069>
2024-09-10T07:40:06.337Z	<Rishabh Dave> so `fs:valgrind` as well?
2024-09-10T07:42:27.491Z	<Venky Shankar> yes
2024-09-10T07:42:32.138Z	<Venky Shankar> actually
2024-09-10T07:42:41.656Z	<Venky Shankar> better would be do run with --filter mirror
2024-09-10T07:42:50.593Z	<Venky Shankar> that would take care of all mirror related tests.
2024-09-10T07:44:00.021Z	<Rishabh Dave> > better would be do run with --filter mirror yeah, i was going to do that. so `--filter mirror,valgrind` looks right? all 3 will be launch in same run/batch.
2024-09-10T08:43:21Z	<Venky Shankar> wouldn't just --filter mirror work
2024-09-10T08:43:29.831Z	<Venky Shankar> that would also pick up the valgrind job for mirror
2024-09-10T09:40:40.489Z	<Rishabh Dave> okay.
2024-09-10T11:53:24.815Z	<Dhairya Parmar> hey venky, (re: <https://tracker.ceph.com/issues/66581>) the context list is empty throughput the process i.e. `wait_on_context_list(in->waitfor_caps);` is waiting for an empty list. Is this fine?
2024-09-10T11:53:36.964Z	<Dhairya Parmar> hey @Venky Shankar, (re: <https://tracker.ceph.com/issues/66581>) the context list is empty throughput the process i.e. `wait_on_context_list(in->waitfor_caps);` is waiting for an empty list. Is this fine?
2024-09-10T12:02:18.841Z	<Venky Shankar> Has the client requested any caps from the MDS?
2024-09-10T12:02:28.604Z	<Venky Shankar> read caps to satisfy the read operation?
2024-09-10T12:02:53.618Z	<Venky Shankar> It can be empty if the client has the required caps (so there isn't anything to ask the mds for)
2024-09-10T12:30:05.629Z	<Milind Changire> @Patrick Donnelly review this please - <https://github.com/ceph/ceph/pull/57459>
2024-09-10T12:54:36.921Z	<Dhairya Parmar> @Venky Shankar since the read call log happens to exist only after the the quiesce is complete and the fact that context list is empty, im leaning towards the theory that the client lock is held my mutex (write call)
2024-09-10T12:55:11.204Z	<Dhairya Parmar> @Venky Shankar since the read call log happens to exist only after the the quiesce is complete and the fact that context list is empty, im leaning towards the theory that the client lock is held by mutex (write call)
2024-09-10T12:55:21.867Z	<Venky Shankar> kk, in that case you need to figure which thread is holding the client lock till the while quiesce cycle is underway.
2024-09-10T13:08:51.213Z	<Kotresh H R> Yeah, it got missed. There is a test case failure reported. I will get it root caused and merged asap
2024-09-10T17:10:57.237Z	<Rishabh Dave> The run finished some time ago. I've gone through 35 failed jobs and recognized 13 issues. Following are those failure summarized in format we use QA run history wiki - ```<https://tracker.ceph.com/issues/67967> <https://pulpito.ceph.com/rishabh-2024-09-10_06:58:10-fs-wip-rishabh-testing-20240909.165402-squid-distro-default-smithi/> * <https://tracker.ceph.com/issues/66877> qa/cephfs: ignore OSD down warning in cluster log * <https://tracker.ceph.com/issues/57676> qa: error during scrub thrashing: rank damage found: {'backtrace'} * <https://tracker.ceph.com/issues/67030> RuntimeError: error during quiesce thrashing: Error quiescing set 'd3bb2849': 1 (EPERM) * <https://tracker.ceph.com/issues/65020> qa: Scrub error on inode 0x1000000356c (/volumes/qa/sv_0/2f8f6bb4-3ea9-47a0-bd79-a0f50dc149d5/client.0/tmp/clients/client7/~dmtmp/PARADOX) see mds.b log and `damage ls` output for details" in cluster log * <https://tracker.ceph.com/issues/67595> valgrind error: Leak_PossiblyLost posix_memalign UnknownInlinedFun ceph::buffer::v15_2_0::create_aligned_in_mempool(unsigned int, unsigned int, int) * <https://tracker.ceph.com/issues/65779> qa: valgrind error: Leak_StillReachable calloc calloc __trans_list_add * <https://tracker.ceph.com/issues/64502> pacific/quincy/v18.2.0: client: ceph-fuse fails to unmount after upgrade to main * <https://tracker.ceph.com/issues/57656> qa: dbench.sh fails with Resource Temporarily Unavailable * <https://tracker.ceph.com/issues/64711> Test failure: test_cephfs_mirror_cancel_mirroring_and_readd * <https://tracker.ceph.com/issues/63700> qa: test_cd_with_args failure * <https://tracker.ceph.com/issues/65372> qa: The following counters failed to be set on mds daemons: {'mds.exported', 'mds.imported'} * <https://tracker.ceph.com/issues/66639> qa: cluster [WRN] client could not reconnect as file system flag refuse_client_session is set' in cluster log * <https://tracker.ceph.com/issues/67336> cephfs: test failing to decode backtrace/symlink with ceph-dencoder```
2024-09-10T17:13:39.562Z	<Rishabh Dave> Following are failures - ```Unrecognized Failures ================ .. note:: no failure reason was printed by pulpito fail 7898963 0:13:22 fs/thrash/workloads/{begin/{0-install 1-ceph 2-logrotate 3-modules} clusters/1a5s-mds-1c-client conf/{client mds mgr mon osd} distro/{ubuntu_latest} mount/fuse msgr-failures/none objectstore-ec/bluestore-comp-ec-root overrides/{client-shutdown frag ignorelist_health ignorelist_wrongly_marked_down pg_health prefetch_dirfrags/no prefetch_dirfrags/yes prefetch_entire_dirfrags/no prefetch_entire_dirfrags/yes races session_timeout thrashosds-health} ranks/3 tasks/{1-thrash/with-quiesce 2-workunit/suites/pjd}} 2 fail 7898916 0:23:41 fs/functional/{begin/{0-install 1-ceph 2-logrotate 3-modules} clusters/1a3s-mds-4c-client conf/{client mds mgr mon osd} distro/{centos_latest} mount/kclient/{mount-syntax/{v2} mount overrides/{distro/testing/k-testing ms-die-on-skipped}} objectstore/bluestore-bitmap overrides/{ignorelist_health ignorelist_wrongly_marked_down no_client_pidfile pg_health} subvol_versions/create_subvol_version_v1 tasks/scrub} 2 Failure Reason: Test failure: test_scrub_backtrace (tasks.cephfs.test_scrub.TestScrub) fail 7898945 0:12:05 fs/functional/{begin/{0-install 1-ceph 2-logrotate 3-modules} clusters/1a3s-mds-4c-client conf/{client mds mgr mon osd} distro/{ubuntu_latest} mount/kclient/{mount-syntax/{v1} mount overrides/{distro/stock/{centos_9.stream k-stock} ms-die-on-skipped}} objectstore/bluestore-ec-root overrides/{ignorelist_health ignorelist_wrongly_marked_down no_client_pidfile pg_health} subvol_versions/create_subvol_version_v2 tasks/strays} 2 Failure Reason: Test failure: test_hardlink_reintegration (tasks.cephfs.test_strays.TestStrays) fail 7898865 1:10:29 fs/snaps/{begin/{0-install 1-ceph 2-logrotate 3-modules} clusters/1a3s-mds-1c-client conf/{client mds mgr mon osd} distro/{ubuntu_latest} mount/fuse objectstore-ec/bluestore-ec-root overrides/{ignorelist_health ignorelist_wrongly_marked_down pg_health} tasks/workunit/snaps} 2 Failure Reason: "2024-09-10T07:57:51.490832+0000 mon.a (mon.0) 363 : cluster [WRN] Health check failed: 1 OSD(s) experiencing slow operations in BlueStore (BLUESTORE_SLOW_OP_ALERT)" in cluster log fail 7898892 3:17:03 fs/upgrade/featureful_client/upgraded_client/{bluestore-bitmap centos_9.stream clusters/1-mds-2-client-micro conf/{client mds mgr mon osd} overrides/{ignorelist_health ignorelist_wrongly_marked_down multimds/no multimds/yes pg-warn pg_health} tasks/{0-from/quincy 1-client 2-upgrade 3-client-upgrade 4-compat_client 5-client-sanity}} 3 Failure Reason: Command failed (workunit test suites/fsstress.sh) on smithi196 with status 124: 'mkdir -p -- /home/ubuntu/cephtest/mnt.1/client.1/tmp && cd -- /home/ubuntu/cephtest/mnt.1/client.1/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=1cf819b7212a6048e92fc78c4eff963344f6d56b TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="1" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.1 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.1 CEPH_MNT=/home/ubuntu/cephtest/mnt.1 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.1/qa/workunits/suites/fsstress.sh' fail 7898976 0:18:14 fs/thrash/workloads/{begin/{0-install 1-ceph 2-logrotate 3-modules} clusters/1a5s-mds-1c-client conf/{client mds mgr mon osd} distro/{centos_latest} mount/kclient/{mount-syntax/{v2} mount overrides/{distro/stock/{centos_9.stream k-stock} ms-die-on-skipped}} msgr-failures/osd-mds-delay objectstore-ec/bluestore-comp overrides/{client-shutdown frag ignorelist_health ignorelist_wrongly_marked_down pg_health prefetch_dirfrags/no prefetch_dirfrags/yes prefetch_entire_dirfrags/no prefetch_entire_dirfrags/yes races session_timeout thrashosds-health} ranks/5 tasks/{1-thrash/with-quiesce 2-workunit/fs/snaps}} 2 Failure Reason: Command failed (workunit test fs/snaps/snaptest-git-ceph.sh) on smithi057 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=1cf819b7212a6048e92fc78c4eff963344f6d56b TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 6h /home/ubuntu/cephtest/clone.client.0/qa/workunits/fs/snaps/snaptest-git-ceph.sh' fail 7899039 0:55:55 fs/workload/{0-centos_9.stream begin/{0-install 1-cephadm 2-logrotate 3-modules} clusters/1a11s-mds-1c-client-3node conf/{client mds mgr mon osd} mount/kclient/{base/{mount-syntax/{v2} mount overrides/{distro/testing/k-testing ms-die-on-skipped}} ms_mode/legacy wsync/yes} objectstore-ec/bluestore-comp-ec-root omap_limit/10000 overrides/{cephsqlite-timeout frag ignorelist_health ignorelist_wrongly_marked_down osd-asserts pg_health session_timeout} ranks/1 standby-replay tasks/{0-subvolume/{with-quota} 1-check-counter 2-scrub/yes 3-snaps/no 4-flush/no 5-quiesce/with-quiesce 6-workunit/suites/ffsb}} 3 Failure Reason: "2024-09-10T10:55:10.672242+0000 mon.a (mon.0) 1169 : cluster [WRN] Health check failed: failed to probe daemons or devices (CEPHADM_REFRESH_FAILED)" in cluster log ffsb.sh ------- fail 7898872 0:33:14 fs/workload/{0-centos_9.stream begin/{0-install 1-cephadm 2-logrotate 3-modules} clusters/1a11s-mds-1c-client-3node conf/{client mds mgr mon osd} mount/kclient/{base/{mount-syntax/{v2} mount overrides/{distro/stock/{centos_9.stream k-stock} ms-die-on-skipped}} ms_mode/crc wsync/no} objectstore-ec/bluestore-bitmap omap_limit/10 overrides/{cephsqlite-timeout frag ignorelist_health ignorelist_wrongly_marked_down osd-asserts pg_health session_timeout} ranks/1 standby-replay tasks/{0-subvolume/{with-namespace-isolated} 1-check-counter 2-scrub/no 3-snaps/yes 4-flush/no 5-quiesce/no 6-workunit/suites/ffsb}} 3 Failure Reason: Command failed (workunit test suites/ffsb.sh) on smithi073 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=1cf819b7212a6048e92fc78c4eff963344f6d56b TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/suites/ffsb.sh' fail 7898913 0:47:58 fs/thrash/workloads/{begin/{0-install 1-ceph 2-logrotate 3-modules} clusters/1a5s-mds-1c-client conf/{client mds mgr mon osd} distro/{ubuntu_latest} mount/kclient/{mount-syntax/{v1} mount overrides/{distro/testing/k-testing ms-die-on-skipped}} msgr-failures/none objectstore-ec/bluestore-comp-ec-root overrides/{client-shutdown frag ignorelist_health ignorelist_wrongly_marked_down pg_health prefetch_dirfrags/no prefetch_dirfrags/yes prefetch_entire_dirfrags/no prefetch_entire_dirfrags/yes races session_timeout thrashosds-health} ranks/1 tasks/{1-thrash/with-quiesce 2-workunit/suites/ffsb}} 2 Failure Reason: Command failed (workunit test suites/ffsb.sh) on smithi136 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=1cf819b7212a6048e92fc78c4eff963344f6d56b TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/suites/ffsb.sh'``` Dead Job - ```dead 7898898 fs/fscrypt/{begin/{0-install 1-ceph 2-logrotate 3-modules} bluestore-bitmap clusters/1-mds-1-client conf/{client mds mgr mon osd} distro/{centos_latest} mount/kclient/{mount-syntax/{v1} mount overrides/{distro/testing/k-testing ms-die-on-skipped}} overrides/{ignorelist_health ignorelist_health_more ignorelist_wrongly_marked_down osd pg-warn pg_health} tasks/{0-client 1-tests/fscrypt-common}}```
2024-09-10T17:16:51.040Z	<Rishabh Dave> The run finished some time ago. It has 43 failures, 1 dead and 1 running (that is stuck IMO). I've gone through 35 failed jobs and recognized 13 issues. I've summarized them below. 9 failures and 1 dead job are left to be checked. @Venky Shankar Can you please check them and merge the PR tomorrow? I would've finished this review but I need to catch a train in an hour. In case it's not possible, I'll continue as soon as I get back from the PTO. Link to the QA ticket - <https://tracker.ceph.com/issues/67967> Link to the PR that is being tested - <https://github.com/ceph/ceph/pull/59672> Following are those issues I found in the failure I went through, they are summarized in format we use QA run history wiki - ```<https://tracker.ceph.com/issues/67967> <https://pulpito.ceph.com/rishabh-2024-09-10_06:58:10-fs-wip-rishabh-testing-20240909.165402-squid-distro-default-smithi/> * <https://tracker.ceph.com/issues/66877> qa/cephfs: ignore OSD down warning in cluster log * <https://tracker.ceph.com/issues/57676> qa: error during scrub thrashing: rank damage found: {'backtrace'} * <https://tracker.ceph.com/issues/67030> RuntimeError: error during quiesce thrashing: Error quiescing set 'd3bb2849': 1 (EPERM) * <https://tracker.ceph.com/issues/65020> qa: Scrub error on inode 0x1000000356c (/volumes/qa/sv_0/2f8f6bb4-3ea9-47a0-bd79-a0f50dc149d5/client.0/tmp/clients/client7/~dmtmp/PARADOX) see mds.b log and `damage ls` output for details" in cluster log * <https://tracker.ceph.com/issues/67595> valgrind error: Leak_PossiblyLost posix_memalign UnknownInlinedFun ceph::buffer::v15_2_0::create_aligned_in_mempool(unsigned int, unsigned int, int) * <https://tracker.ceph.com/issues/65779> qa: valgrind error: Leak_StillReachable calloc calloc __trans_list_add * <https://tracker.ceph.com/issues/64502> pacific/quincy/v18.2.0: client: ceph-fuse fails to unmount after upgrade to main * <https://tracker.ceph.com/issues/57656> qa: dbench.sh fails with Resource Temporarily Unavailable * <https://tracker.ceph.com/issues/64711> Test failure: test_cephfs_mirror_cancel_mirroring_and_readd * <https://tracker.ceph.com/issues/63700> qa: test_cd_with_args failure * <https://tracker.ceph.com/issues/65372> qa: The following counters failed to be set on mds daemons: {'mds.exported', 'mds.imported'} * <https://tracker.ceph.com/issues/66639> qa: cluster [WRN] client could not reconnect as file system flag refuse_client_session is set' in cluster log * <https://tracker.ceph.com/issues/67336> cephfs: test failing to decode backtrace/symlink with ceph-dencoder```
2024-09-10T17:16:52.211Z	<Rishabh Dave> The run finished some time ago. It has 43 failures, 1 dead and 1 running (that is stuck IMO). I've gone through 35 failed jobs and recognized 13 issues. I've summarized them below. 9 failures and 1 dead job are left to be checked. @Venky Shankar Can you please check them and merge the PR tomorrow? I would've finished this review but I need to catch a train in an hour. In case it's not possible, I'll continue as soon as I get back from the PTO. Link to the QA ticket - <https://tracker.ceph.com/issues/67967> Link to the PR that is being tested - <https://github.com/ceph/ceph/pull/59672> Following are those issues I found in the failure I went through, they are summarized in format we use QA run history wiki - ```<https://tracker.ceph.com/issues/67967> <https://pulpito.ceph.com/rishabh-2024-09-10_06:58:10-fs-wip-rishabh-testing-20240909.165402-squid-distro-default-smithi/> * <https://tracker.ceph.com/issues/66877> qa/cephfs: ignore OSD down warning in cluster log * <https://tracker.ceph.com/issues/57676> qa: error during scrub thrashing: rank damage found: {'backtrace'} * <https://tracker.ceph.com/issues/67030> RuntimeError: error during quiesce thrashing: Error quiescing set 'd3bb2849': 1 (EPERM) * <https://tracker.ceph.com/issues/65020> qa: Scrub error on inode 0x1000000356c (/volumes/qa/sv_0/2f8f6bb4-3ea9-47a0-bd79-a0f50dc149d5/client.0/tmp/clients/client7/~dmtmp/PARADOX) see mds.b log and `damage ls` output for details" in cluster log * <https://tracker.ceph.com/issues/67595> valgrind error: Leak_PossiblyLost posix_memalign UnknownInlinedFun ceph::buffer::v15_2_0::create_aligned_in_mempool(unsigned int, unsigned int, int) * <https://tracker.ceph.com/issues/65779> qa: valgrind error: Leak_StillReachable calloc calloc __trans_list_add * <https://tracker.ceph.com/issues/64502> pacific/quincy/v18.2.0: client: ceph-fuse fails to unmount after upgrade to main * <https://tracker.ceph.com/issues/57656> qa: dbench.sh fails with Resource Temporarily Unavailable * <https://tracker.ceph.com/issues/64711> Test failure: test_cephfs_mirror_cancel_mirroring_and_readd * <https://tracker.ceph.com/issues/63700> qa: test_cd_with_args failure * <https://tracker.ceph.com/issues/65372> qa: The following counters failed to be set on mds daemons: {'mds.exported', 'mds.imported'} * <https://tracker.ceph.com/issues/66639> qa: cluster [WRN] client could not reconnect as file system flag refuse_client_session is set' in cluster log * <https://tracker.ceph.com/issues/67336> cephfs: test failing to decode backtrace/symlink with ceph-dencoder```
2024-09-10T17:17:03.978Z	<Rishabh Dave> The run finished some time ago. It has 43 failures, 1 dead and 1 running (that is stuck IMO). I've gone through 35 failed jobs and recognized 13 issues. I've summarized them below. 9 failures and 1 dead job are left to be checked. @Venky Shankar Can you please check them and merge the PR tomorrow? I would've finished this review but I need to catch a train in an hour. In case it's not possible, I'll continue as soon as I get back from the PTO. Link to the QA ticket - <https://tracker.ceph.com/issues/67967> Link to the PR that is being tested - <https://github.com/ceph/ceph/pull/59672> Following are those issues I found in the failure I went through, they are summarized in format we use QA run history wiki - ```<https://tracker.ceph.com/issues/67967> <https://pulpito.ceph.com/rishabh-2024-09-10_06:58:10-fs-wip-rishabh-testing-20240909.165402-squid-distro-default-smithi/> * <https://tracker.ceph.com/issues/66877> qa/cephfs: ignore OSD down warning in cluster log * <https://tracker.ceph.com/issues/57676> qa: error during scrub thrashing: rank damage found: {'backtrace'} * <https://tracker.ceph.com/issues/67030> RuntimeError: error during quiesce thrashing: Error quiescing set 'd3bb2849': 1 (EPERM) * <https://tracker.ceph.com/issues/65020> qa: Scrub error on inode 0x1000000356c (/volumes/qa/sv_0/2f8f6bb4-3ea9-47a0-bd79-a0f50dc149d5/client.0/tmp/clients/client7/~dmtmp/PARADOX) see mds.b log and `damage ls` output for details" in cluster log * <https://tracker.ceph.com/issues/67595> valgrind error: Leak_PossiblyLost posix_memalign UnknownInlinedFun ceph::buffer::v15_2_0::create_aligned_in_mempool(unsigned int, unsigned int, int) * <https://tracker.ceph.com/issues/65779> qa: valgrind error: Leak_StillReachable calloc calloc __trans_list_add * <https://tracker.ceph.com/issues/64502> pacific/quincy/v18.2.0: client: ceph-fuse fails to unmount after upgrade to main * <https://tracker.ceph.com/issues/57656> qa: dbench.sh fails with Resource Temporarily Unavailable * <https://tracker.ceph.com/issues/64711> Test failure: test_cephfs_mirror_cancel_mirroring_and_readd * <https://tracker.ceph.com/issues/63700> qa: test_cd_with_args failure * <https://tracker.ceph.com/issues/65372> qa: The following counters failed to be set on mds daemons: {'mds.exported', 'mds.imported'} * <https://tracker.ceph.com/issues/66639> qa: cluster [WRN] client could not reconnect as file system flag refuse_client_session is set' in cluster log * <https://tracker.ceph.com/issues/67336> cephfs: test failing to decode backtrace/symlink with ceph-dencoder```
2024-09-10T17:19:49.600Z	<Rishabh Dave> @Venky Shankar Due to soft-wrap it can be had to read the above messages. I've the copied these message in ticket comment here - <https://tracker.ceph.com/issues/67967#note-2>

ceph - cephfs - 2024-09-10

Any issue? please create an issue here and use the infra label.