2024-07-08T02:46:47.735Z | <Xiubo Li> Found the root cause, it's a deadlock between the `lock cache` and the `create/unlink` , please see <https://tracker.ceph.com/issues/65607#note-7>.
Need to find a proper way to fix it |
2024-07-08T02:48:56.122Z | <Xiubo Li> @Patrick Donnelly @Venky Shankar I want to add one mds options to allow us to disable the `lock cache` feature together with bug fixing, could you foresee any potential issue for this ? |
2024-07-08T04:33:32.327Z | <Xiubo Li> @Patrick Donnelly @Venky Shankar I want to add one mds option to allow us to disable the `lock cache` feature together with bug fixing, could you foresee any potential issue for this ? |
2024-07-08T04:39:15.717Z | <Jos Collin> @Venky Shankar The PR is ready? |
2024-07-08T04:40:20.140Z | <Venky Shankar> no, sorry! |
2024-07-08T04:40:27.918Z | <Venky Shankar> I worked on it on Friday |
2024-07-08T04:40:36.617Z | <Venky Shankar> but couldn't finish it |
2024-07-08T04:40:50.079Z | <Jos Collin> ok |
2024-07-08T04:40:58.314Z | <Venky Shankar> Give me a few hours |
2024-07-08T04:41:01.765Z | <Venky Shankar> will push a fix |
2024-07-08T04:41:57.623Z | <Jos Collin> ok |
2024-07-08T05:57:21.460Z | <Jos Collin> @Venky Shankar <https://tracker.ceph.com/issues/64761>, how much is the maximum number of sync operations? As I read from some of the qa tests, it should be 3? |
2024-07-08T06:01:13.877Z | <Jos Collin> That should be a small value. |
2024-07-08T06:16:50.490Z | <Venky Shankar> checking |
2024-07-08T06:17:20.243Z | <Venky Shankar> there are 4 sync threads I believe |
2024-07-08T06:21:20.532Z | <Jos Collin> I was referring to <https://github.com/ceph/ceph/blob/main/qa/tasks/cephfs/test_mirroring.py#L1332>. |
2024-07-08T06:21:30.248Z | <Jos Collin> Looks like 3 by default? |
2024-07-08T06:21:30.595Z | <Venky Shankar> ah |
2024-07-08T06:21:35.795Z | <Venky Shankar> then yes 🙂 |
2024-07-08T06:21:41.677Z | <Venky Shankar> You can check [PeerReplayer.cc](http://PeerReplayer.cc) |
2024-07-08T06:27:02.729Z | <Jos Collin> ok, will do. I think just make a configurable for setting the number of ops and a check/rejecting if the throttle is reached, somewhere in sync_snaps() would suffice. |
2024-07-08T06:27:29.445Z | <Venky Shankar> well, throttler would actually do that |
2024-07-08T06:27:40.602Z | <Venky Shankar> so, basically PeerReplayer can inherit throttler |
2024-07-08T06:27:55.514Z | <Venky Shankar> no need to coding the throttle again |
2024-07-08T06:28:00.383Z | <Jos Collin> aah, that's cool |
2024-07-08T06:28:01.274Z | <Venky Shankar> check usages of Throttle class |
2024-07-08T06:28:04.082Z | <Venky Shankar> yeh +1 |
2024-07-08T06:28:14.011Z | <Jos Collin> Let me check that then. |
2024-07-08T06:28:26.934Z | <Jos Collin> thx |
2024-07-08T08:51:00.638Z | <Jos Collin> @Venky Shankar Today's standup at 4pm India time? |
2024-07-08T09:21:04.170Z | <Venky Shankar> yes |
2024-07-08T09:21:12.522Z | <Venky Shankar> Most folks in US are on PTO this week. |
2024-07-08T12:50:57.982Z | <Dhairya Parmar> @Venky Shankar remember the upstream user who hit <https://tracker.ceph.com/issues/61009> recently, well once they got their fs up and running but unfortunately the filesystem crashed again with
```/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.7/rpm/el8/BUILD/ceph-17.2.7/src/mds/MDCache.cc: 275: FAILED ceph_assert(!p)
ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x135) [0x7fc4693334a3]
2: /usr/lib64/ceph/libceph-common.so.2(+0x269669) [0x7fc469333669]
3: /usr/bin/ceph-mds(+0x2d2f04) [0x55caed408f04]
4: (MDCache::cow_inode(CInode*, snapid_t)+0x334) [0x55caed40eeb4]
5: (MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*, CDentry*, snapid_t, CInode**, CDentry::linkage_t*)+0x8d2) [0x55caed410792]
6: (Server::_unlink_local(boost::intrusive_ptr<MDRequestImpl>&, CDentry*, CDentry*)+0x966) [0x55caed35b596]
7: (Server::handle_client_request(boost::intrusive_ptr<MClientRequest const> const&)+0x6d5) [0x55caed3ab295]
8: (Server::dispatch(boost::intrusive_ptr<Message const> const&)+0x13c) [0x55caed3aff6c]
9: (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, bool)+0x5db) [0x55caed2f827b]
10: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const> const&)+0x5c) [0x55caed2f88bc]
11: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x1bf) [0x55caed2e1c2f]
12: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0x478) [0x7fc4695aeed8]
13: (DispatchQueue::entry()+0x50f) [0x7fc4695ac31f]
14: (DispatchQueue::DispatchThread::entry()+0x11) [0x7fc469676381]
15: /lib64/libpthread.so.0(+0x81ca) [0x7fc4683211ca]
16: clone()``` |
2024-07-08T12:51:37.267Z | <Venky Shankar> oh dear |
2024-07-08T12:51:39.239Z | <Venky Shankar> that's nasty |
2024-07-08T12:51:40.096Z | <Venky Shankar> 😕 |
2024-07-08T12:51:48.594Z | <Venky Shankar> they are using kclient I believe? |
2024-07-08T12:52:14.432Z | <Dhairya Parmar> yeah, and to add to the issues they even turned off `nowsync` with their kclient mounts |
2024-07-08T12:52:44.724Z | <Venky Shankar> nowsync does not effect _`_unlink_local` crash_ |
2024-07-08T12:53:20.735Z | <Dhairya Parmar> okay |
2024-07-08T12:53:31.876Z | <Dhairya Parmar> im reading cow_inode code |
2024-07-08T12:53:43.599Z | <Dhairya Parmar> and the comment is interesting
```/*
* note: i'm currently cheating wrt dirty and inode.version on cow
* items. instead of doing a full dir predirty, i just take the
* original item's version, and set the dirty flag (via
* mutation::add_cow_{inode,dentry}() and mutation::apply(). that
* means a special case in the dir commit clean sweep assertions.
* bah.
*/``` |
2024-07-08T12:53:50.119Z | <Venky Shankar> kclient passed in that bogus value I believe |
2024-07-08T12:54:08.328Z | <Dhairya Parmar> while replaying jounrnal? |
2024-07-08T12:54:12.795Z | <Dhairya Parmar> while replaying jounrnal you mean? |
2024-07-08T12:55:02.458Z | <Venky Shankar> nope |
2024-07-08T12:55:12.065Z | <Venky Shankar> In a client request and that got recorded in the journal |
2024-07-08T12:55:25.698Z | <Dhairya Parmar> oh@ |
2024-07-08T12:55:27.450Z | <Venky Shankar> see: <https://tracker.ceph.com/issues/54546#note-7> |
2024-07-08T12:55:29.670Z | <Dhairya Parmar> oh! |
2024-07-08T12:58:14Z | <Dhairya Parmar> the assertion failure are different |
2024-07-08T12:59:04.204Z | <Venky Shankar> kk |
2024-07-08T12:59:09.301Z | <Venky Shankar> _unlink_local gave it away |
2024-07-08T12:59:15.319Z | <Venky Shankar> I believe, this could be kind of related |
2024-07-08T12:59:34.055Z | <Venky Shankar> btw, did the user run scrub post recovering from the interval_set failure? |
2024-07-08T13:00:10.002Z | <Dhairya Parmar> yeah they did ran it, here's what they did
```We finished the scrub of the filesystem on Friday and ran a repair scrub on the 1 directory which had metadata damage. After doing so and rebooting, the cluster reported no issues and data was accessible again.``` |
2024-07-08T13:00:27.288Z | <Dhairya Parmar> so post scrub they did a repaid scrub, does that ring any bells? |
2024-07-08T13:01:16.076Z | <Venky Shankar> that's expected |
2024-07-08T13:01:21.242Z | <Venky Shankar> some repair might be normal |
2024-07-08T13:01:36.298Z | <Dhairya Parmar> okay |
2024-07-08T13:01:49.767Z | <Venky Shankar> I'll have to check what could have happened |
2024-07-08T13:02:31.445Z | <Dhairya Parmar> https://files.slack.com/files-pri/T1HG3J90S-F07BTTXH3NV/download/ceph-mds.pebbles-s2.log-20240708.gz |
2024-07-08T13:03:49.307Z | <Dhairya Parmar> this is the mds log that contains that assertion failure. once they hit this, they reset the journal again(basically following the same steps i sent them) but now the fs is in the failed state |
2024-07-08T13:04:03.057Z | <Dhairya Parmar> they've shared logs. im downloading it currently |
2024-07-08T13:07:43.409Z | <Venky Shankar> I'm not familiar with this backtrace, so resetting the journal might not help. |
2024-07-08T13:07:54.116Z | <Venky Shankar> My guess is the fs is in a pretty bad state 😕 |
2024-07-08T13:08:05.294Z | <Dhairya Parmar> yikes |
2024-07-08T13:10:32.807Z | <Dhairya Parmar> done downloading the log |
2024-07-08T13:10:39.836Z | <Dhairya Parmar> it is filled with this line
```2024-07-08T10:19:19.935+0100 7f5201d08700 -1 log_channel(cluster) log [ERR] : loaded dup inode 0x1005e1773e4 [565,head] v3196 at /cephfs2-users/vchang/20240701_VC143-2_GluA3ATD-NB65/Movies/Images-Disc1/GridSquare_896930/Data/FoilHole_2014315_Data_918775_16_20240702_111309_Fractions.tiff.vifeyek0.partial, but inode 0x1005e1773e4.head v3198 already exists at /cephfs2-users/vchang/20240701_VC143-2_GluA3ATD-NB65/Movies/Images-Disc1/GridSquare_896930/Data/FoilHole_2014315_Data_918775_16_20240702_111309_Fractions.tiff``` |
2024-07-08T13:13:09.939Z | <Venky Shankar> dup inodes - two (non-hardlink) inodes which have the same inode number 🤷 |
2024-07-08T13:16:33.034Z | <Dhairya Parmar> did something go wrong while flushing |
2024-07-08T13:18:07.538Z | <system> file ceph-mds.pebbles-s4.log.gz too big to download (3582651 > allowed size: 1000000) |
2024-07-08T13:18:07.538Z | <Dhairya Parmar> full log post second journal reset attempt |
2024-07-08T13:21:01.579Z | <Venky Shankar> Did someone recommend to reset the journal? |
2024-07-08T13:21:24.171Z | <Dhairya Parmar> nope. they should've thought it is the same issue again and went for it |
2024-07-08T13:22:30.622Z | <Dhairya Parmar> but the backtraces are the same, it has the same backtrace |
2024-07-08T13:22:43.480Z | <Dhairya Parmar> but the backtraces are the same, it has the same assertion failure |
2024-07-08T13:32:54.772Z | <Venky Shankar> we don't have enough information of what could have gone wrong |
2024-07-08T13:55:05.334Z | <Dhairya Parmar> what should i ask them for better insights? |
2024-07-08T13:55:26.112Z | <Dhairya Parmar> and what would be the solution to bail the fs out of this situation? |