ceph - cephfs - 2024-07-22

Timestamp (UTC)Message
2024-07-22T06:59:22.342Z
<Rishabh Dave> PTAL. just a reminder
2024-07-22T07:25:13.760Z
<Dhairya Parmar> hey @Frank Troost
2024-07-22T07:25:31.585Z
<Dhairya Parmar> since you mention "inconsistent" files, did you try scrub?
2024-07-22T08:58:06.263Z
<Frank Troost> I did not try to scrub yet. I have started a disaster recovery but got stuck at data scan extents. After 5 days not seen any response from it. Remounted the original cephfs again, so it is healthy again. But for now all the files are listable, directory's visible. Files are broken and nog recognized. 
About the scrub, do I need to just scrub each disk? Deep or normal?
2024-07-22T08:59:14.422Z
<Frank Troost> Thanks for you response! I did not try to scrub yet. I have started a disaster recovery but got stuck at data scan extents. After 5 days not see any response from it. Remounted the original cephfs again, so it is healthy again. But for now all the files are listable, directory's visible. Files are broken and not recognized. 
About the scrub, do I need to just scrub each disk? Deep or normal?
2024-07-22T10:32:55.569Z
<Dhairya Parmar> deep scrub is pretty I/O intensive
2024-07-22T10:34:29.233Z
<Dhairya Parmar> if you think your environment can handle a huge I/O burden; go for it ๐Ÿ™‚
2024-07-22T10:34:40.818Z
<Dhairya Parmar> otherwise start with a normal scrub
2024-07-22T10:49:28.549Z
<Frank Troost> At the moment it is running photorec over the broken backup mirror. So I will wait for that to finish. After that will run a scrub. Is there any way to see if a scrub will help? I have no broken PGS.
2024-07-22T10:52:27.702Z
<Dhairya Parmar> scrub is for FS consistency, since you mentioned
> The cephfs datapool only contain inconsistent files
i was motivated to suggest you fs scrub
2024-07-22T10:53:43.559Z
<Frank Troost> Fs scrub is the same as an osd scrub?
2024-07-22T10:53:52.130Z
<Dhairya Parmar> nope
2024-07-22T10:53:56.580Z
<Dhairya Parmar> wait a sec
2024-07-22T10:53:59.052Z
<Dhairya Parmar> pulling doc
2024-07-22T10:54:03.734Z
<Frank Troost> Thnx
2024-07-22T10:54:16.463Z
<Dhairya Parmar> there you go <https://docs.ceph.com/en/latest/cephfs/scrub/>
2024-07-22T10:55:35.310Z
<Frank Troost> Thnx will try it again. Last time I tried it was gone from status within a minute. Let you know!
2024-07-22T10:55:52.571Z
<Frank Troost> Thnx will try it again. Last time I tried it was gone from status within a minute. Let you know! Thnx!
2024-07-22T10:55:54.344Z
<Dhairya Parmar> sure
2024-07-22T10:55:59.403Z
<Dhairya Parmar> nw ๐Ÿ™‚
2024-07-22T10:59:13.017Z
<Frank Troost> did run in
2024-07-22T10:59:15.442Z
<Frank Troost> 2024-07-22T12:58:08.994+0200 723fa2a006c0 0 client.47365859 ms_handle_reset on v2:192.168.1.253:6805/751201160
{
  "status": "scrub active (302176 inodes in the stack)",
  "scrubs": {
    "b85cb08f-a432-42b7-8909-7113ac0b5a66": {
      "path": "/",
      "tag": "b85cb08f-a432-42b7-8909-7113ac0b5a66",
      "options": "recursive,force"
    }
  }
}
root@dell:~# ceph tell mds.cephfs:0 scrub status
2024-07-22T12:58:19.502+0200 7589916006c0 0 client.47365905 ms_handle_reset on v2:192.168.1.253:6805/802314589
2024-07-22T12:58:19.536+0200 7589916006c0 0 client.47365909 ms_handle_reset on v2:192.168.1.253:6805/802314589
{
  "status": "no active scrubs running",
  "scrubs": {}
}
2024-07-22T11:02:25.417Z
<Dhairya Parmar> and? did it help in any way with consistency?
2024-07-22T11:15:05.844Z
<Frank Troost> look like it still busy, i see the following in the logs:

pgmap v212501: 414 pgs: 1 active+clean+scrubbing+deep, 413 active+clean; 8.9 TiB data, 21 TiB used, 9.4 TiB / 30 TiB avail; 63 MiB/s rd, 15 op/s

can't reach the mount to check files. So we need to wait to get an answer about the files i think.
2024-07-22T11:29:22.071Z
<Dhairya Parmar> yeah, will take some time
2024-07-22T11:29:30.367Z
<Dhairya Parmar> considering the data you have
2024-07-22T11:36:09.818Z
<Dhairya Parmar> yeah, DNM has been removed and added `needs-qa` tag
2024-07-22T12:38:24.155Z
<Frank Troost> Many images, jpeg and nef files.
2024-07-22T12:38:57.882Z
<Frank Troost> Strange there is no status to get for it
2024-07-22T12:39:56.512Z
<Dhairya Parmar> what does `ceph tell mds.<cephfs>:0 scrub status` tell?
2024-07-22T13:16:01.482Z
<Frank Troost> Error ENOENT: problem getting command descriptions from mds.cephfs:0
2024-07-22T13:20:17.366Z
<Frank Troost> cluster:
  id:   331e39f1-c492-4d3e-a761-0f2b04702381
  health: HEALTH_WARN
      insufficient standby MDS daemons available

 services:
  mon: 2 daemons, quorum pve,dell (age 21h)
  mgr: pve(active, since 5d), standbys: dell
  mds: 2/2 daemons up
  osd: 10 osds: 10 up (since 21h), 10 in (since 4d)
  rgw: 1 daemon active (1 hosts, 1 zones)

 data:
  volumes: 2/2 healthy
  pools:  17 pools, 414 pgs
  objects: 8.58M objects, 8.9 TiB
  usage:  21 TiB used, 9.4 TiB / 30 TiB avail
  pgs:   413 active+clean
       1  active+clean+scrubbing+deep
2024-07-22T13:21:24.781Z
<Frank Troost> the last part makes me believe that the command is doing some scrubbing.
2024-07-22T13:24:36.805Z
<Frank Troost> the last part makes me believe that the command is doing some scrubbing.
furthermore i see all mds's went down except 2, one for longhorn, one for cephfs. all the others stopped.

2024-07-22T15:22:34.056+0200 712dcd79e040 -1 client.47529960 resolve_mds: no MDS daemons found by name `cephfs:all'
2024-07-22T15:22:34.056+0200 712dcd79e040 -1 client.47529960 FSMap: cephfs:1 longhorn:1 {cephfs:0=pve-4=up:active(laggy or crashed),longhorn:0=pve-3=up:active}
Error ENOENT: problem getting command descriptions from mds.cephfs:all
2024-07-22T13:45:55.487Z
<Frank Troost> seems like the mds's are now broken. errors:

Jul 22 15:43:51 pve ceph-mds[179841]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Jul 22 15:43:51 pve ceph-mds[179841]:     -5> 2024-07-22T15:43:51.755+0200 79fe186006c0 -1 mds.0.cache.den(0x1 lost+found) newly corrupt dentry to be committed: [dentry #0x1/lost+found [head,head] auth (dversion lock) pv=588090 v=588088 ino=0x4 state=1610612736 | inodepin=1 dirty=1 0x5b3897ea6780]
Jul 22 15:43:51 pve ceph-mds[179841]:     -1> 2024-07-22T15:43:51.756+0200 79fe186006c0 -1 ./src/mds/MDCache.cc: In function 'void MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*, CDentry*, snapid_t, CInode**, CDentry::linkage_t*)' thread 79fe186006c0 time 2024-07-22T15:43:51.755758+0200
2024-07-22T14:58:03.482Z
<Frank Troost> i managed to get everything working again by failing the fs, recover dentries, marking the fs as repaired and rejoinable.
After this I started the scrub again. It seems to be running fine now, "status": "scrub active (225502 inodes in the stack)", that number is lowering, so let's wait and pray.
2024-07-22T16:55:26.146Z
<Frank Troost> scrubbing of the filesystem is finished. Sadly still the files are broken. When i try to open a file, it says damaged metadata.
2024-07-22T17:21:24.040Z
<gregsfortytwo> The RADOS object name is <hex-encoded inode number of the directory>.<shard>; shard starts at zero and will go up as the MDS splits the directory into multiple objects when it grows
2024-07-22T17:27:05.969Z
<Wes Dillingham> Hey thanks for the leads in this case the object name from the scrub error is: `49:b696a4d0:::1001ecbe90b.00000000:head`
2024-07-22T17:27:42.193Z
<Wes Dillingham> 49 is the cephfs-metadata pool, are you saying b696a4d0 is the hex-encoded inode number of the directory?
2024-07-22T17:28:10.643Z
<Wes Dillingham> I do know there is a command to map inodes to dirs from the MDS admin socket
2024-07-22T17:33:25.274Z
<gregsfortytwo> Oh you pulled that out of an osd log? So thatโ€™s <pool>:<pg hashid or some such>:::<hex inode number>.<shard>:<snapid>
2024-07-22T18:36:50.341Z
<Wes Dillingham> the cluster log actually:
```2024-07-15T05:29:51.296263+0000 osd.261 (osd.261) 167381 : cluster [WRN] Large omap object found. Object: 49:b696a4d0:::1001ecbe90b.00000000:head PG: 49.b25696d (49.6d) Key count: 235597 Size (bytes): 114971830
2024-07-15T05:40:00.000207+0000 mon.ceph001 (mon.0) 30369746 : cluster [WRN]     Search the cluster log for 'Large omap object found' for more details.```
2024-07-22T18:39:54.547Z
<Wes Dillingham> previously i used: `ceph tell mds.scratchfs.ceph004.kijema dump inode 0x1001e4d83ae | jq -r .path` to get a path from an inode
2024-07-22T18:41:22.753Z
<Wes Dillingham> in this case the inode would be 1001ecbe90b ?
2024-07-22T18:42:19.938Z
<Wes Dillingham> if i try `ceph tell mds.scratchfs.ceph004.qufxye dump inode 0x1001ecbe90b` i am getting "dump inode failed, wrong inode number or the inode is not cached" I may have the inode incorrect
2024-07-22T18:43:04.007Z
<Wes Dillingham> its a single active mds FS and the active FS is being contacted in the tell
2024-07-22T18:46:13.357Z
<Wes Dillingham> ```dump inode <number:int>                 dump inode by inode number```
2024-07-22T19:58:46.182Z
<gregsfortytwo> Annoyingly, you have to convert from hex to regular decimal for most commands
2024-07-22T21:23:54.776Z
<Wes Dillingham> hmm i get the same error when i convert hex to decimal specifically trying `dump inode 1100028307723`
2024-07-22T21:24:27.709Z
<Wes Dillingham> also trying `0x1100028307723` since that 0x seems to precede the inodes in the mds dump ops in flight

Any issue? please create an issue here and use the infra label.