ceph - cephadm - 2024-09-10

Timestamp (UTC)Message
2024-09-10T09:42:00.474Z
<verdurin> I'm looking at this again now the backfilling has finished and I'm back.
Cephadm complains that I can't remove the original correct host from the list, because it can see the OSDs running there.
Those are the OSDs actually running. In CRUSH, they are shown under the incorrect host.

Would prefer to avoid draining this host, but perhaps that's the cleanest way?

I share your nervousness about editing the config-key store.
2024-09-10T13:53:54.036Z
<Michael W> Is it possible to have a shell script or something that can be run in the background on a Ceph monitor server to automatically reduce the "large omap objects" so we can keep the Ceph cluster in a HEALTH_OK state and not keep having a HEALTH_WARN state? We are using SolarWinds to keep an active monitor on our Ceph cluster and our upper management team keep reminding us that the Ceph cluster keeps showing as "yellow" (caution) and not in "green" (healthy) state. We are running our Ceph cluster deployed with Cephadm and Reef release.

ubuntu@jppddceph005:~$ sudo ceph -s
  cluster:
    id:     93c77433-1f7f-11ef-a53a-7d4403f765ec
    health: HEALTH_WARN
            2293 large omap objects

  services:
    mon: 5 daemons, quorum jppddceph005,jppddceph006,jppddceph007,jppddceph009,jppddceph008 (age 9w)
    mgr: jppddceph007.kwssbr(active, since 2w), standbys: jppddceph006.biykhq, jppddceph005.hurwhe
    osd: 288 osds: 288 up (since 34h), 288 in (since 4d)
    rgw: 48 daemons active (24 hosts, 1 zones)

  data:
    pools:   9 pools, 8801 pgs
    objects: 352.29M objects, 308 TiB
    usage:   1.1 PiB used, 4.2 PiB / 5.3 PiB avail
    pgs:     8713 active+clean
             56   active+clean+scrubbing+deep
             32   active+clean+scrubbing

  io:
    client:   2.7 MiB/s rd, 390 KiB/s wr, 1.06k op/s rd, 22 op/s wr
2024-09-10T14:00:39.085Z
<Brian P> Do you know the actual problem or are you asking about how to do a cronjob with Ceph?
2024-09-10T14:06:46.958Z
<Michael W> We are using our Ceph cluster buckets for Veeam backup repository, so there is a lot of objects being stored. We just need a way to clean up all of these large omap object meesages. We know these are usually just info-based and really don't hurt performance, but we are just trying to clean up the cluster to put it in a positive light for monitoring aspects.
2024-09-10T14:07:10.213Z
<Joel Davidow> large omap objects warning can be either more shards per bucket than threshold value, or size of shard more than threshold value. do you know what thresholds are configured? check with `ceph config get osd osd_deep_scrub_large_omap_object_key_threshold`  and  `ceph config get osd osd_deep_scrub_large_omap_object_value_sum_threshold`
2024-09-10T14:09:04.150Z
<Michael W> ubuntu@jppddceph005:~$ sudo ceph config get osd osd_deep_scrub_large_omap_object_key_threshold
200000
2024-09-10T14:09:05.877Z
<Michael W> ubuntu@jppddceph005:~$ sudo ceph config get osd osd_deep_scrub_large_omap_object_value_sum_threshold
1073741824
2024-09-10T14:10:32.562Z
<Brian P> damn
2024-09-10T14:11:10.190Z
<Brian P> I would advise understanding the root problem, as Joel is hinting
2024-09-10T14:11:16.523Z
<Michael W> Is it possible to have a shell script or something that can be run in the background on a Ceph monitor server to automatically reduce the "large omap objects" so we can keep the Ceph cluster in a HEALTH_OK state and not keep having a HEALTH_WARN state? We are using SolarWinds to keep an active monitor on our Ceph cluster and our upper management team keep reminding us that the Ceph cluster keeps showing as "yellow" (caution) and not in "green" (healthy) state. We are running our Ceph cluster deployed with Cephadm and Reef release.

ubuntu@jppddceph005:~$ sudo ceph -s
  cluster:
    id:     93c77433-1f7f-11ef-a53a-7d4403f765ec
    health: HEALTH_WARN
            2293 large omap objects

  services:
    mon: 5 daemons, quorum jppddceph005,jppddceph006,jppddceph007,jppddceph009,jppddceph008 (age 9w)
    mgr: jppddceph007.kwssbr(active, since 2w), standbys: jppddceph006.biykhq, jppddceph005.hurwhe
    osd: 288 osds: 288 up (since 34h), 288 in (since 4d)
    rgw: 48 daemons active (24 hosts, 1 zones)

  data:
    pools:   9 pools, 8801 pgs
    objects: 352.29M objects, 308 TiB
    usage:   1.1 PiB used, 4.2 PiB / 5.3 PiB avail
    pgs:     8713 active+clean
             56   active+clean+scrubbing+deep
             32   active+clean+scrubbing

  io:
    client:   2.7 MiB/s rd, 390 KiB/s wr, 1.06k op/s rd, 22 op/s wr
2024-09-10T14:11:18.989Z
<Joel Davidow> ok, those are the defaults. does your cluster have dynamic resharding turned on? check with `ceph config get osd rgw_dynamic_resharding`
2024-09-10T14:12:07.148Z
<Brian P> (default true in Reef)
2024-09-10T14:12:14.474Z
<Michael W> ubuntu@jppddceph005:~$ sudo ceph config get osd rgw_dynamic_resharding
true
2024-09-10T14:13:59.789Z
<Michael W> ubuntu@jppddceph005:~$ sudo ceph health detail
HEALTH_WARN 2293 large omap objects; 4 pgs not deep-scrubbed in time
[WRN] LARGE_OMAP_OBJECTS: 2293 large omap objects
    2293 large objects found in pool 'default.rgw.buckets.index'
    Search the cluster log for 'Large omap object found' for more details.
[WRN] PG_NOT_DEEP_SCRUBBED: 4 pgs not deep-scrubbed in time
    pg 7.1eae not deep-scrubbed since 2024-08-26T11:07:24.807982+0000
    pg 7.19f9 not deep-scrubbed since 2024-08-26T06:10:02.644883+0000
    pg 7.1712 not deep-scrubbed since 2024-08-26T00:37:22.779376+0000
    pg 3.41 not deep-scrubbed since 2024-08-25T13:27:06.559377+0000
2024-09-10T14:15:18.895Z
<Joel Davidow> ok, what is the return from `radosgw-admin bucket limit check` ?
2024-09-10T14:17:35.740Z
<Michael W> Output of said command...: https://files.slack.com/files-pri/T1HG3J90S-F07LS79DLLA/download/ceph-radosgw-bucket-limit.txt
2024-09-10T14:32:07.473Z
<Joel Davidow> ok, so all the values for objects per shard are below configured threshold so warning must be related to size.   Next step is to `grep 'Large omap object found' /var/log/syslog` on a mon - if nothing, retry same cmd on active mgr - then look for commonality across those log messages for clues.
2024-09-10T15:10:38.122Z
<Michael W> Logged into the mgr and looked at the ceph.log file and it seems a bit more like this -- 2024-09-10T13:21:02.275228+0000 mon.jppddceph005 (mon.0) 2895016 : cluster 3 Health check update: 2291 large omap objects (LARGE_OMAP_OBJECTS)
2024-09-10T13:26:01.056837+0000 osd.26 (osd.26) 351794 : cluster 3 Large omap object found. Object: 6:cc0aa183:::.dir.2671e75b-fe72-4544-b7dd-5cc2f166a85d.713688.3.6.222:head PG: 6.c1855033 (6.33) Key count: 230870 Size (bytes): 109203091
2024-09-10T13:26:01.056848+0000 osd.26 (osd.26) 351795 : cluster 3 Large omap object found. Object: 6:cc1a5884:::.dir.2671e75b-fe72-4544-b7dd-5cc2f166a85d.693489.3.6.614:head PG: 6.211a5833 (6.33) Key count: 216440 Size (bytes): 102358606
2024-09-10T13:26:01.056857+0000 osd.26 (osd.26) 351796 : cluster 3 Large omap object found. Object: 6:cc1b532e:::.dir.2671e75b-fe72-4544-b7dd-5cc2f166a85d.689350.2.2.36:head PG: 6.74cad833 (6.33) Key count: 361242 Size (bytes): 170131340
2024-09-10T13:26:40.381983+0000 osd.26 (osd.26) 351797 : cluster 3 Large omap object found. Object: 6:cc2f2e51:::.dir.2671e75b-fe72-4544-b7dd-5cc2f166a85d.714683.2.2.33:head PG: 6.8a74f433 (6.33) Key count: 249968 Size (bytes): 120291112
2024-09-10T15:36:20.095Z
<Joel Davidow> ok, so it looks like the key counts are actually the issue (above threshold), not size (below threshold)- I had mixed up user limits (`radosgw-admin bucket limit check`) with bucket index limits… you can get bucket id for a given bucket name from `radosgw-admin bucket stats --bucket=<bucket_name>`  Check one bucket id to get format (I believe 2671e75b-fe72-4544-b7dd-5cc2f166a85d7.13688.3 should be a bucket id but not sure if reef has changed the format). If you repeat the grep and add -v <bucket id>, you can see if other buckets are involved. Repeating that -v pattern will result in a list of bucket ids involved, which you can then get the names for by iterating through `radosgw-admin bucket stats --bucket=<bucket_name>`  At least then you’ll know what buckets are involved.
2024-09-10T16:18:54.208Z
<Michael W> So this is the list of the buckets I ran the stats command against. What command do I need to run against the ID to get what you are looking for?: https://files.slack.com/files-pri/T1HG3J90S-F07M5P9E53K/download/ceph-radosgw-bucket-stats-list.txt
2024-09-10T16:40:33.056Z
<Joel Davidow> That return has it. For example, bucket jv-dd-winossql-vaprod has id 2671e75b-fe72-4544-b7dd-5cc2f166a85d.1057732.2. There are 10 buckets that have nothing returned - not sure what the story is with those.  Not seeing the bucket id in stats return for the few large omaps found in log ie 2671e75b-fe72-4544-b7dd-5cc2f166a85d.713688.3 not in stats return. If you can sort out which buckets have the large omap log messages, then you can determine if any are related to buckets that have been deleted or if any are in buckets that you’re ok with losing data in. Is the cluster using rgw multi-site?
2024-09-10T17:02:11.650Z
<drakonstein> To get your status green while working on this, you can mute the large omap alert.
```ceph health mute LARGE_OMAP_OBJECTS```
With large omap objects, the alert doesn't clear until the PG has been deep-scrubbed again. So if you change a setting thinking it will resolve, reshard a bucket, etc... the alert won't clear until you deep scrub the pg again
2024-09-10T20:25:56.428Z
<Michael W> @Joel Davidow No, not using RGW multi-site.
2024-09-10T22:33:04.218Z
<Joel Davidow> This is as far as my experience with large omap objects goes so recommend that you post to the ceph user list with a summary of the information from this initial investigation including that the cluster isn’t multi-site and if any of the objects in the logs are related to deleted buckets or in buckets that can’t lose data. Hopefully someone with more experience on this issue can help you with next steps.

Any issue? please create an issue here and use the infra label.