ceph - cephadm - 2024-09-11

Timestamp (UTC)Message
2024-09-11T11:15:26.405Z
<Eugen Block> By the way, have you restarted the OSDs on the affected host? Shouldn't they show as on the correct host in the crushmap?
2024-09-11T11:39:07.634Z
<verdurin> They are running, yes.
They show in the crushmap as being on the host with the incorrect name.
2024-09-11T12:53:41.032Z
<Eugen Block> I think it would help to have the full picture, especially since two weeks have passed. Could you paste `ceph orch host ls` and `ceph osd tree`? Redact sensitive data, if necessary.
2024-09-11T12:57:38.587Z
<verdurin> Will do - after my very late lunch.
2024-09-11T13:39:20.057Z
<verdurin> Edited `ceph orch host ls` output:
```HOST                              ADDR            LABELS
cephg-p000.ceph  10.xxx.xxx.90   mds nfs rgw rbd.client
cephg-p002.ceph  10.xxx.xxx.92
cephg-p003.ceph  10.xxx.xxx.93
cephm-p000.ceph  10.xxx.xxx.200  _admin
cephm-p001.ceph  10.xxx.xxx.201  _admin
cephm-p002.ceph  10.xxx.xxx.202  _admin
cephm-p003.ceph  10.xxx.xxx.203  _admin
cephm-p004.ceph  10.xxx.xxx.204  _admin
cepho-p000.ceph  10.xxx.xxx.100  osd
cepho-p001.ceph  10.xxx.xxx.101  osd _no_schedule
cepho-p002.ceph  10.xxx.xxx.102  osd
cepho-p003.ceph  10.xxx.xxx.103  osd
cepho-p004.ceph  10.xxx.xxx.104  osd _admin
cepho-p005.ceph  10.xxx.xxx.105  osd _admin
cepho-p006.ceph  10.xxx.xxx.106  osd
cepho-p007.ceph  10.xxx.xxx.107  osd
cepho-p008.ceph  10.xxx.xxx.108  osd
cepho-p009.ceph  10.xxx.xxx.109  osd
cepho-p010.ceph  10.xxx.xxx.110  osd
cepho-p011.ceph  10.xxx.xxx.111  osd
20 hosts in cluster```
2024-09-11T13:39:33.091Z
<verdurin> `p008` is the problematic one.
2024-09-11T13:43:47.649Z
<verdurin> `ceph osd tree`  :
```ID   CLASS  WEIGHT      TYPE NAME                         STATUS     REWEIGHT  PRI-AFF
 -1         2168.57397  root default
 -3          633.80457      datacenter xx0
-12          633.80457          rack rack-27
-35          365.99744              host cepho-p000
  3    hdd    20.33319                  osd.3                    up   1.00000  1.00000
 13    hdd    20.33319                  osd.13                   up   1.00000  1.00000
 19    hdd    20.33319                  osd.19                   up   1.00000  1.00000
 30    hdd    20.33319                  osd.30                   up   1.00000  1.00000
 39    hdd    20.33319                  osd.39                   up   1.00000  1.00000
 47    hdd    20.33319                  osd.47                   up   1.00000  1.00000
 55    hdd    20.33319                  osd.55                   up   1.00000  1.00000
 62    hdd    20.33319                  osd.62                   up   1.00000  1.00000
 70    hdd    20.33319                  osd.70                   up   1.00000  1.00000
 77    hdd    20.33319                  osd.77                   up   1.00000  1.00000
 86    hdd    20.33319                  osd.86                   up   1.00000  1.00000
 94    hdd    20.33319                  osd.94                   up   1.00000  1.00000
103    hdd    20.33319                  osd.103                  up   1.00000  1.00000
111    hdd    20.33319                  osd.111                  up   1.00000  1.00000
119    hdd    20.33319                  osd.119                  up   1.00000  1.00000
127    hdd    20.33319                  osd.127                  up   1.00000  1.00000
135    hdd    20.33319                  osd.135                  up   1.00000  1.00000
143    hdd    20.33319                  osd.143                  up   1.00000  1.00000
-27                  0              host cepho-p001
 28    hdd           0                  osd.28                   up   1.00000  1.00000
-23          133.90356              host cepho-p002
  7    hdd     7.43909                  osd.7                    up   1.00000  1.00000
  8    hdd     7.43909                  osd.8                    up   1.00000  1.00000
 17    hdd     7.43909                  osd.17                   up   1.00000  1.00000
 24    hdd     7.43909                  osd.24                   up   1.00000  1.00000
 32    hdd     7.43909                  osd.32                   up   1.00000  1.00000
 40    hdd     7.43909                  osd.40                   up   1.00000  1.00000
 50    hdd     7.43909                  osd.50                   up   1.00000  1.00000
 57    hdd     7.43909                  osd.57                   up   1.00000  1.00000
 64    hdd     7.43909                  osd.64                   up   1.00000  1.00000
 72    hdd     7.43909                  osd.72                   up   1.00000  1.00000
 83    hdd     7.43909                  osd.83                   up   1.00000  1.00000
 89    hdd     7.43909                  osd.89                   up   1.00000  1.00000
 98    hdd     7.43909                  osd.98                   up   1.00000  1.00000
105    hdd     7.43909                  osd.105                  up   1.00000  1.00000
114    hdd     7.43909                  osd.114                  up   1.00000  1.00000
122    hdd     7.43909                  osd.122                  up   1.00000  1.00000
129    hdd     7.43909                  osd.129                  up   1.00000  1.00000
136    hdd     7.43909                  osd.136                  up   1.00000  1.00000
-31          133.90356              host cepho-p003
  1    hdd     7.43909                  osd.1                    up   1.00000  1.00000
 14    hdd     7.43909                  osd.14                   up   1.00000  1.00000
 22    hdd     7.43909                  osd.22                   up   1.00000  1.00000
 29    hdd     7.43909                  osd.29                   up   1.00000  1.00000
 37    hdd     7.43909                  osd.37                   up   1.00000  1.00000
 46    hdd     7.43909                  osd.46                   up   1.00000  1.00000
 54    hdd     7.43909                  osd.54                   up   1.00000  1.00000
 61    hdd     7.43909                  osd.61                   up   1.00000  1.00000
 69    hdd     7.43909                  osd.69                   up   1.00000  1.00000
 78    hdd     7.43909                  osd.78                   up   1.00000  1.00000
 85    hdd     7.43909                  osd.85                   up   1.00000  1.00000
 93    hdd     7.43909                  osd.93                   up   1.00000  1.00000
101    hdd     7.43909                  osd.101                  up   1.00000  1.00000
109    hdd     7.43909                  osd.109                  up   1.00000  1.00000
116    hdd     7.43909                  osd.116                  up   1.00000  1.00000
124    hdd     7.43909                  osd.124                  up   1.00000  1.00000
132    hdd     7.43909                  osd.132                  up   1.00000  1.00000
140    hdd     7.43909                  osd.140                  up   1.00000  1.00000
-39          767.22302      datacenter xx1
-19          767.22302          rack rack-xx1-0
 -2                  0              host cepho-p008
-41          365.99744              host cepho-p008-ceph
144    hdd    20.33319                  osd.144                  up   1.00000  1.00000
145    hdd    20.33319                  osd.145                  up   1.00000  1.00000
146    hdd    20.33319                  osd.146                  up   1.00000  1.00000
147    hdd    20.33319                  osd.147                  up   1.00000  1.00000
148    hdd    20.33319                  osd.148                  up   1.00000  1.00000
149    hdd    20.33319                  osd.149                  up   1.00000  1.00000
150    hdd    20.33319                  osd.150                  up   1.00000  1.00000
151    hdd    20.33319                  osd.151                  up   1.00000  1.00000
152    hdd    20.33319                  osd.152                  up   1.00000  1.00000
153    hdd    20.33319                  osd.153                  up   1.00000  1.00000
154    hdd    20.33319                  osd.154                  up   1.00000  1.00000
155    hdd    20.33319                  osd.155                  up   1.00000  1.00000
156    hdd    20.33319                  osd.156                  up   1.00000  1.00000
157    hdd    20.33319                  osd.157                  up   1.00000  1.00000
158    hdd    20.33319                  osd.158                  up   1.00000  1.00000
159    hdd    20.33319                  osd.159                  up   1.00000  1.00000
160    hdd    20.33319                  osd.160                  up   1.00000  1.00000
161    hdd    20.33319                  osd.161                  up   1.00000  1.00000
 -5          133.74187              host cepho-p009
102    hdd     7.27739                  osd.102                  up   1.00000  1.00000
162    hdd     7.43909                  osd.162                  up   1.00000  1.00000
163    hdd     7.43909                  osd.163                  up   1.00000  1.00000
164    hdd     7.43909                  osd.164                  up   1.00000  1.00000
166    hdd     7.43909                  osd.166                  up   1.00000  1.00000
167    hdd     7.43909                  osd.167                  up   1.00000  1.00000
168    hdd     7.43909                  osd.168                  up   1.00000  1.00000
169    hdd     7.43909                  osd.169                  up   1.00000  1.00000
170    hdd     7.43909                  osd.170                  up   1.00000  1.00000
171    hdd     7.43909                  osd.171                  up   1.00000  1.00000
172    hdd     7.43909                  osd.172                  up   1.00000  1.00000
173    hdd     7.43909                  osd.173                  up   1.00000  1.00000
174    hdd     7.43909                  osd.174                  up   1.00000  1.00000
175    hdd     7.43909                  osd.175                  up   1.00000  1.00000
176    hdd     7.43909                  osd.176                  up   1.00000  1.00000
177    hdd     7.43909                  osd.177                  up   1.00000  1.00000
178    hdd     7.43909                  osd.178                  up   1.00000  1.00000
179    hdd     7.43909                  osd.179                  up   1.00000  1.00000
 -7          133.74187              host cepho-p010
180    hdd     7.43909                  osd.180           destroyed         0  1.00000
181    hdd     7.43909                  osd.181                  up   1.00000  1.00000
183    hdd     7.43909                  osd.183                  up   1.00000  1.00000
184    hdd     7.43909                  osd.184                  up   1.00000  1.00000
185    hdd     7.43909                  osd.185                  up   1.00000  1.00000
186    hdd     7.43909                  osd.186                  up   1.00000  1.00000
187    hdd     7.43909                  osd.187                  up   1.00000  1.00000
188    hdd     7.43909                  osd.188                  up   1.00000  1.00000
189    hdd     7.43909                  osd.189                  up   1.00000  1.00000
190    hdd     7.43909                  osd.190                  up   1.00000  1.00000
191    hdd     7.43909                  osd.191                  up   1.00000  1.00000
192    hdd     7.43909                  osd.192                  up   1.00000  1.00000
193    hdd     7.43909                  osd.193                  up   1.00000  1.00000
194    hdd     7.43909                  osd.194                  up   1.00000  1.00000
195    hdd     7.43909                  osd.195                  up   1.00000  1.00000
196    hdd     7.43909                  osd.196                  up   1.00000  1.00000
197    hdd     7.43909                  osd.197                  up   1.00000  1.00000
218    hdd     7.27739                  osd.218                  up   1.00000  1.00000
-10          133.74187              host cepho-p011
198    hdd     7.43909                  osd.198                  up   1.00000  1.00000
199    hdd     7.43909                  osd.199                  up   1.00000  1.00000
200    hdd     7.43909                  osd.200                  up   1.00000  1.00000
201    hdd     7.43909                  osd.201                  up   1.00000  1.00000
203    hdd     7.43909                  osd.203                  up   1.00000  1.00000
204    hdd     7.43909                  osd.204                  up   1.00000  1.00000
205    hdd     7.43909                  osd.205                  up   1.00000  1.00000
206    hdd     7.43909                  osd.206                  up   1.00000  1.00000
207    hdd     7.43909                  osd.207                  up   1.00000  1.00000
208    hdd     7.43909                  osd.208                  up   1.00000  1.00000
209    hdd     7.43909                  osd.209                  up   1.00000  1.00000
210    hdd     7.43909                  osd.210                  up   1.00000  1.00000
211    hdd     7.43909                  osd.211                  up   1.00000  1.00000
212    hdd     7.43909                  osd.212                  up   1.00000  1.00000
213    hdd     7.43909                  osd.213                  up   1.00000  1.00000
214    hdd     7.43909                  osd.214                  up   1.00000  1.00000
215    hdd     7.43909                  osd.215                  up   1.00000  1.00000
217    hdd     7.27739                  osd.217                  up   1.00000  1.00000```
2024-09-11T13:44:49.896Z
<verdurin> Continued:
``` -8          767.54645      datacenter xx2
-17          267.80713          rack rack-7
-33          133.90356              host cepho-p006
  0    hdd     7.43909                  osd.0                    up   1.00000  1.00000
 12    hdd     7.43909                  osd.12                   up   1.00000  1.00000
 21    hdd     7.43909                  osd.21                   up   1.00000  1.00000
 27    hdd     7.43909                  osd.27                   up   1.00000  1.00000
 33    hdd     7.43909                  osd.33                   up   1.00000  1.00000
 43    hdd     7.43909                  osd.43                   up   1.00000  1.00000
 52    hdd     7.43909                  osd.52                   up   1.00000  1.00000
 60    hdd     7.43909                  osd.60                   up   1.00000  1.00000
 68    hdd     7.43909                  osd.68                   up   1.00000  1.00000
 76    hdd     7.43909                  osd.76                   up   1.00000  1.00000
 84    hdd     7.43909                  osd.84                   up   1.00000  1.00000
 92    hdd     7.43909                  osd.92                   up   1.00000  1.00000
100    hdd     7.43909                  osd.100                  up   1.00000  1.00000
108    hdd     7.43909                  osd.108                  up   1.00000  1.00000
117    hdd     7.43909                  osd.117                  up   1.00000  1.00000
126    hdd     7.43909                  osd.126                  up   1.00000  1.00000
133    hdd     7.43909                  osd.133                  up   1.00000  1.00000
141    hdd     7.43909                  osd.141                  up   1.00000  1.00000
-25          133.90356              host cepho-p007
  5    hdd     7.43909                  osd.5                    up   1.00000  1.00000
  9    hdd     7.43909                  osd.9                    up   1.00000  1.00000
 20    hdd     7.43909                  osd.20                   up   1.00000  1.00000
 26    hdd     7.43909                  osd.26                   up   1.00000  1.00000
 36    hdd     7.43909                  osd.36                   up   1.00000  1.00000
 41    hdd     7.43909                  osd.41                   up   1.00000  1.00000
 48    hdd     7.43909                  osd.48                   up   1.00000  1.00000
 56    hdd     7.43909                  osd.56                   up   1.00000  1.00000
 65    hdd     7.43909                  osd.65                   up   1.00000  1.00000
 73    hdd     7.43909                  osd.73                   up   1.00000  1.00000
 80    hdd     7.43909                  osd.80                   up   1.00000  1.00000
 88    hdd     7.43909                  osd.88                   up   1.00000  1.00000
 96    hdd     7.43909                  osd.96                   up   1.00000  1.00000
104    hdd     7.43909                  osd.104                  up   1.00000  1.00000
112    hdd     7.43909                  osd.112                  up   1.00000  1.00000
120    hdd     7.43909                  osd.120                  up   1.00000  1.00000
128    hdd     7.43909                  osd.128                  up   1.00000  1.00000
137    hdd     7.43909                  osd.137                  up   1.00000  1.00000
-14          499.73932          rack rack-8
-37          365.99744              host cepho-p004
  6    hdd    20.33319                  osd.6                    up   1.00000  1.00000
 15    hdd    20.33319                  osd.15                   up   1.00000  1.00000
 23    hdd    20.33319                  osd.23                   up   1.00000  1.00000
 31    hdd    20.33319                  osd.31                   up   1.00000  1.00000
 38    hdd    20.33319                  osd.38                   up   1.00000  1.00000
 45    hdd    20.33319                  osd.45                   up   1.00000  1.00000
 53    hdd    20.33319                  osd.53                   up   1.00000  1.00000
 63    hdd    20.33319                  osd.63                   up   1.00000  1.00000
 71    hdd    20.33319                  osd.71                   up   1.00000  1.00000
 79    hdd    20.33319                  osd.79                   up   1.00000  1.00000
 87    hdd    20.33319                  osd.87                   up   1.00000  1.00000
 95    hdd    20.33319                  osd.95                   up   1.00000  1.00000
110    hdd    20.33319                  osd.110                  up   1.00000  1.00000
118    hdd    20.33319                  osd.118                  up   1.00000  1.00000
125    hdd    20.33319                  osd.125                  up   1.00000  1.00000
134    hdd    20.33319                  osd.134                  up   1.00000  1.00000
142    hdd    20.33319                  osd.142                  up   1.00000  1.00000
216    hdd    20.33319                  osd.216                  up   1.00000  1.00000
-29          133.74187              host cepho-p005
  4    hdd     7.43909                  osd.4                    up   1.00000  1.00000
 10    hdd     7.43909                  osd.10                   up   1.00000  1.00000
 18    hdd     7.43909                  osd.18                   up   1.00000  1.00000
 25    hdd     7.43909                  osd.25                   up   1.00000  1.00000
 35    hdd     7.43909                  osd.35                   up   1.00000  1.00000
 44    hdd     7.43909                  osd.44                   up   1.00000  1.00000
 51    hdd     7.43909                  osd.51                   up   1.00000  1.00000
 59    hdd     7.43909                  osd.59                   up   1.00000  1.00000
 66    hdd     7.43909                  osd.66                   up   1.00000  1.00000
 75    hdd     7.43909                  osd.75                   up   1.00000  1.00000
 81    hdd     7.43909                  osd.81                   up   1.00000  1.00000
 91    hdd     7.43909                  osd.91                   up   1.00000  1.00000
 99    hdd     7.43909                  osd.99                   up   1.00000  1.00000
107    hdd     7.43909                  osd.107                  up   1.00000  1.00000
115    hdd     7.43909                  osd.115                  up   1.00000  1.00000
123    hdd     7.43909                  osd.123                  up   1.00000  1.00000
130    hdd     7.27739                  osd.130                  up   1.00000  1.00000
138    hdd     7.43909                  osd.138                  up   1.00000  1.00000```
2024-09-11T13:57:41.222Z
<Benard> I am using cephadm to configure some servies, particularly SSL certificates for RGW. I have a yml definition file and when I update that file and apply it I can see that the config in cephadm gets updated, however the haproxy containers are still using the old cert.

Is there a way for me to apply that config properly short of removing and redeploying the service?
2024-09-11T13:58:53.345Z
<Ken Carlile> I ran into this too, let me check my notes to see if I actually noted how to get around it
2024-09-11T13:59:09.105Z
<Ken Carlile> I'm just about to be in a meeting, but I'll check afterwards.
2024-09-11T13:59:52.633Z
<Benard> Thanks Ken
2024-09-11T14:01:18.082Z
<Ken Carlile> ```When cert is changed, need to change the spec file, apply the spec file, then redeploy the services:
ceph orch apply -i rgw_spec.yaml
ceph orch apply -i rgw-ingress_spec.yaml
ceph orch redeploy rgw.s3rgw
ceph orch redeploy ingress.s3rgw.ingress```
2024-09-11T14:01:21.663Z
<Ken Carlile> that's from my notes
2024-09-11T14:01:29.035Z
<Ken Carlile> so yes, you need to re-apply and then redeploy
2024-09-11T14:01:40.366Z
<Ken Carlile> at least in my whole vast experience of using this for 2 weeks kinda
2024-09-11T14:01:44.945Z
<Ken Carlile> it was pretty painless
2024-09-11T14:02:11.421Z
<Benard> Thanks for confirming Ken 👍
2024-09-11T14:02:32.144Z
<Ken Carlile> still waiting for the meeting to start 😛
2024-09-11T14:25:24.532Z
<Ken Carlile> No worries. The other gotcha I ran into (and why I know about replacing the cert at all) is that it needs the crt with the full chain. At least in our case.
2024-09-11T14:33:19.229Z
<Benard> I remember running into this somewhere else, but in my case in this environment it worked fine with just the normal cert 🤷‍♂️
2024-09-11T14:36:16.700Z
<Ken Carlile> Maybe because we use a wildcard cert. Got me, certs are very much a thing that I barely have a handle on understanding.
2024-09-11T14:38:33.914Z
<Eugen Block> So just to be on the same page: the `ceph orch host ls` output is correct? but in the crush tree, the name cepho-p008 is the correct one (but it's empty) while the OSDs are underneath cepho-p008-ceph. Are you sure you restarted the OSDs on p008 after you fixed the hostname? If not, I'd give that a try, maybe reboot the entire host to ensure the hostname is correct. I expect the OSDs to be reported underneath p008 after a reboot/restart.
2024-09-11T14:41:21.653Z
<Eugen Block> And if that doesn't work either, you could rename the (currently empty) bucket cepho-p008 (cepho-p008-temp or something): ceph osd crush rename-bucket ...
And then rename cepho-p008-ceph to cepho-p008, that should work, shouldn't it?
2024-09-11T14:51:03.020Z
<Wes Dillingham> Are these versioned buckets?
2024-09-11T14:57:24.715Z
<Wes Dillingham> I dont fully understand interplay with versioned buckets but we see large omaps and resharding issues (unable to stay below the omap key threshold for versioned buckets).  <https://tracker.ceph.com/issues/67884>
2024-09-11T15:47:32.256Z
<verdurin> Good point - I could certainly reboot it and see what happens.
2024-09-11T15:47:36.607Z
<Ken Carlile> Question on service count: is there an issue with running multiple alertmanagers? If I put the host that the single alertmanager is running on maint, dashboard kind of loses its mind. Can't find anything in the docs about this.
2024-09-11T15:50:11.423Z
<Ken Carlile> I don't have it specified for any particular host, but when setting a host to maintenance mode, it doesn't seem to trigger alertmanager starting on anything else.
2024-09-11T15:50:45.209Z
<Ken Carlile> For that matter, if I do start alertmanager on another host (by upping the instances to 2), I have to manually reconfigure the dashboard to use the other host. This is the same for prometheus.
2024-09-11T17:00:31.976Z
<Michael W> We think we tracked down the issue with the Ceph large omap issues. It seems that our Veeam backups are set to immutable (to prevent deletion) and Ceph doesn't like that for handling object replication, so it complains of the large omap issues. Best we were told to do it just mute the alert for it.
2024-09-11T17:44:42.898Z
<drakonstein> You can also modify these settings to still get warned if you reach a higher value, but this is only really useful if you have a way to reduce the omap size
`osd_deep_scrub_large_omap_object_key_threshold`
`osd_deep_scrub_large_omap_object_value_sum_threshold`
2024-09-11T17:45:17.802Z
<Wes Dillingham> in our case we quadruped the value of the key threshold
2024-09-11T17:45:53.865Z
<Wes Dillingham> to 800,000 iirc
2024-09-11T17:45:57.895Z
<drakonstein> This blog talks about reducing the default value for the key threshold from 2,000,000 to 200,000 [https://42on.com/how-to-handle-large-omap-objects/#:~:text=The%20value%20of%20%E2%80%9Cosd_deep_scrub_large_omap_object_key_threshold%E2%80%9D%20determines,value%20for%20this%20is%20200000](https://42on.com/how-to-handle-large-omap-objects/#:~:text=The%20value%20of%20%E2%80%9Cosd_deep_scrub_large_omap_object_key_threshold%E2%80%9D%20determines,value%20for%20this%20is%20200000).

Any issue? please create an issue here and use the infra label.