ceph - ceph-devel - 2024-10-12

Timestamp (UTC)Message
2024-10-12T13:44:17.175Z
<Alexander Patrakov> Hello team, I am looking at a cluster that was upgraded from 17.2.7 to 18.2.4 a few hours ago. The history is that there was an MDS crash and filesystem damage earlier while still on 17.2.7 when trying to change max_mds from 2 to 1 (this is the same as <https://tracker.ceph.com/issues/60986> where I already commented). I tried to scrub the whole filesystem back then, but it went OOM. I tried to scrub it again after the upgrade, but it seems to be looping over the same stuff and emits so many messages that systemd-journald eats ~70% CPU.

The scrub was started like this: `ceph tell mds.mainfs:0 scrub start  / recursive,repair,force,scrub_mdsdir`

A sample of 200 journal messages is in the attached file. Should I abort the scrub, wait for the snapshots (yes I know where exactly the bad dirs are) to expire, and retry? Or are there any better suggestions?

Is this really a loop or just too-repetitive messages? Is this a known issue? Thanks in advance.: https://files.slack.com/files-pri/T1HG3J90S-F07RE2XD8F8/download/message_5_.txt
2024-10-12T13:55:42.438Z
<Alexander Patrakov> Hello team, I am looking at a cluster that was upgraded from 17.2.7 to 18.2.4 a few hours ago. The history is that there was an MDS crash and filesystem damage earlier while still on 17.2.7 when trying to change max_mds from 2 to 1 (this is the same as <https://tracker.ceph.com/issues/60986> where I already commented). I tried to scrub the whole filesystem back then, but it went OOM. I tried to scrub it again after the upgrade, but it seems to be looping over the same stuff and emits so many messages that systemd-journald eats ~70% CPU.

The scrub was started like this: `ceph tell mds.mainfs:0 scrub start  / recursive,repair,force,scrub_mdsdir`

A sample of 200 journal messages is in the attached file. Should I abort the scrub, wait for the snapshots (yes I know where exactly the bad dirs are, there is nothing important there, and they are only in snapshots) to expire, and retry? Or are there any better suggestions?

Is this really a loop or just too-repetitive messages? Is this a known issue? Thanks in advance.
2024-10-12T13:56:48.377Z
<Alexander Patrakov> Hello team, I am looking at a cluster that was upgraded from 17.2.7 to 18.2.4 a few hours ago. The history is that there was an MDS crash and filesystem damage earlier while still on 17.2.7 when trying to change max_mds from 2 to 1 (this is the same as <https://tracker.ceph.com/issues/60986> where I already commented). I tried to scrub the whole filesystem back then, but it went OOM even with 128 GB of RAM + 480 GB of swap. I tried to scrub it again after the upgrade, but it seems to be looping over the same stuff and emits so many messages that systemd-journald eats ~70% CPU.

The scrub was started like this: `ceph tell mds.mainfs:0 scrub start  / recursive,repair,force,scrub_mdsdir`

A sample of 200 journal messages is in the attached file. Should I abort the scrub, wait for the snapshots (yes I know where exactly the bad dirs are, there is nothing important there, and they are only in snapshots) to expire, and retry? Or are there any better suggestions?

Is this really a loop or just too-repetitive messages? Is this a known issue? Thanks in advance.
2024-10-12T14:11:02.337Z
<Alexander Patrakov> Hello team, I am looking at a cluster that was upgraded from 17.2.7 to 18.2.4 a few hours ago. The history is that there was an MDS crash and filesystem damage earlier while still on 17.2.7 when trying to change max_mds from 2 to 1 (this is the same as <https://tracker.ceph.com/issues/60986> where I already commented). I tried to scrub the whole filesystem back then, but it went OOM even with 128 GB of RAM + 480 GB of swap. I tried to scrub it again after the upgrade, but it seems to be looping over the same stuff and emits so many messages that systemd-journald eats ~70% CPU.

The scrub was started like this: `ceph tell mds.mainfs:0 scrub start  / recursive,repair,force,scrub_mdsdir`

A sample of 200 journal messages is in the attached file. Should I abort the scrub, wait for the snapshots (yes I know where exactly the bad dirs are, there is nothing important there, and they are only in snapshots) to expire, and retry? Or are there any better suggestions?

The scrub status is stuck for more than one hour already on this, i.e., the number of inodes in the stack is not changing:

```{
    "status": "scrub active (253372 inodes in the stack)",
    "scrubs": {
        "e7091739-5bc7-4de3-8cb3-1a6bc7e2e82e": {
            "path": "/",
            "tag": "e7091739-5bc7-4de3-8cb3-1a6bc7e2e82e",
            "options": "recursive,repair,force,scrub_mdsdir"
        }
    }
}```
Is this really a loop or just too-repetitive messages? Is this a known issue? Thanks in advance.
2024-10-12T16:37:02.189Z
<Alexander Patrakov> No progress, I have cancelled the scrub. I will scrub specific directories (that sum to the whole filesystem minus mdsdir) instead.

Any issue? please create an issue here and use the infra label.