2024-09-11T11:15:26.405Z | <Eugen Block> By the way, have you restarted the OSDs on the affected host? Shouldn't they show as on the correct host in the crushmap? |
2024-09-11T11:39:07.634Z | <verdurin> They are running, yes.
They show in the crushmap as being on the host with the incorrect name. |
2024-09-11T12:53:41.032Z | <Eugen Block> I think it would help to have the full picture, especially since two weeks have passed. Could you paste `ceph orch host ls` and `ceph osd tree`? Redact sensitive data, if necessary. |
2024-09-11T12:57:38.587Z | <verdurin> Will do - after my very late lunch. |
2024-09-11T13:39:20.057Z | <verdurin> Edited `ceph orch host ls` output:
```HOST ADDR LABELS
cephg-p000.ceph 10.xxx.xxx.90 mds nfs rgw rbd.client
cephg-p002.ceph 10.xxx.xxx.92
cephg-p003.ceph 10.xxx.xxx.93
cephm-p000.ceph 10.xxx.xxx.200 _admin
cephm-p001.ceph 10.xxx.xxx.201 _admin
cephm-p002.ceph 10.xxx.xxx.202 _admin
cephm-p003.ceph 10.xxx.xxx.203 _admin
cephm-p004.ceph 10.xxx.xxx.204 _admin
cepho-p000.ceph 10.xxx.xxx.100 osd
cepho-p001.ceph 10.xxx.xxx.101 osd _no_schedule
cepho-p002.ceph 10.xxx.xxx.102 osd
cepho-p003.ceph 10.xxx.xxx.103 osd
cepho-p004.ceph 10.xxx.xxx.104 osd _admin
cepho-p005.ceph 10.xxx.xxx.105 osd _admin
cepho-p006.ceph 10.xxx.xxx.106 osd
cepho-p007.ceph 10.xxx.xxx.107 osd
cepho-p008.ceph 10.xxx.xxx.108 osd
cepho-p009.ceph 10.xxx.xxx.109 osd
cepho-p010.ceph 10.xxx.xxx.110 osd
cepho-p011.ceph 10.xxx.xxx.111 osd
20 hosts in cluster``` |
2024-09-11T13:39:33.091Z | <verdurin> `p008` is the problematic one. |
2024-09-11T13:43:47.649Z | <verdurin> `ceph osd tree` :
```ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 2168.57397 root default
-3 633.80457 datacenter xx0
-12 633.80457 rack rack-27
-35 365.99744 host cepho-p000
3 hdd 20.33319 osd.3 up 1.00000 1.00000
13 hdd 20.33319 osd.13 up 1.00000 1.00000
19 hdd 20.33319 osd.19 up 1.00000 1.00000
30 hdd 20.33319 osd.30 up 1.00000 1.00000
39 hdd 20.33319 osd.39 up 1.00000 1.00000
47 hdd 20.33319 osd.47 up 1.00000 1.00000
55 hdd 20.33319 osd.55 up 1.00000 1.00000
62 hdd 20.33319 osd.62 up 1.00000 1.00000
70 hdd 20.33319 osd.70 up 1.00000 1.00000
77 hdd 20.33319 osd.77 up 1.00000 1.00000
86 hdd 20.33319 osd.86 up 1.00000 1.00000
94 hdd 20.33319 osd.94 up 1.00000 1.00000
103 hdd 20.33319 osd.103 up 1.00000 1.00000
111 hdd 20.33319 osd.111 up 1.00000 1.00000
119 hdd 20.33319 osd.119 up 1.00000 1.00000
127 hdd 20.33319 osd.127 up 1.00000 1.00000
135 hdd 20.33319 osd.135 up 1.00000 1.00000
143 hdd 20.33319 osd.143 up 1.00000 1.00000
-27 0 host cepho-p001
28 hdd 0 osd.28 up 1.00000 1.00000
-23 133.90356 host cepho-p002
7 hdd 7.43909 osd.7 up 1.00000 1.00000
8 hdd 7.43909 osd.8 up 1.00000 1.00000
17 hdd 7.43909 osd.17 up 1.00000 1.00000
24 hdd 7.43909 osd.24 up 1.00000 1.00000
32 hdd 7.43909 osd.32 up 1.00000 1.00000
40 hdd 7.43909 osd.40 up 1.00000 1.00000
50 hdd 7.43909 osd.50 up 1.00000 1.00000
57 hdd 7.43909 osd.57 up 1.00000 1.00000
64 hdd 7.43909 osd.64 up 1.00000 1.00000
72 hdd 7.43909 osd.72 up 1.00000 1.00000
83 hdd 7.43909 osd.83 up 1.00000 1.00000
89 hdd 7.43909 osd.89 up 1.00000 1.00000
98 hdd 7.43909 osd.98 up 1.00000 1.00000
105 hdd 7.43909 osd.105 up 1.00000 1.00000
114 hdd 7.43909 osd.114 up 1.00000 1.00000
122 hdd 7.43909 osd.122 up 1.00000 1.00000
129 hdd 7.43909 osd.129 up 1.00000 1.00000
136 hdd 7.43909 osd.136 up 1.00000 1.00000
-31 133.90356 host cepho-p003
1 hdd 7.43909 osd.1 up 1.00000 1.00000
14 hdd 7.43909 osd.14 up 1.00000 1.00000
22 hdd 7.43909 osd.22 up 1.00000 1.00000
29 hdd 7.43909 osd.29 up 1.00000 1.00000
37 hdd 7.43909 osd.37 up 1.00000 1.00000
46 hdd 7.43909 osd.46 up 1.00000 1.00000
54 hdd 7.43909 osd.54 up 1.00000 1.00000
61 hdd 7.43909 osd.61 up 1.00000 1.00000
69 hdd 7.43909 osd.69 up 1.00000 1.00000
78 hdd 7.43909 osd.78 up 1.00000 1.00000
85 hdd 7.43909 osd.85 up 1.00000 1.00000
93 hdd 7.43909 osd.93 up 1.00000 1.00000
101 hdd 7.43909 osd.101 up 1.00000 1.00000
109 hdd 7.43909 osd.109 up 1.00000 1.00000
116 hdd 7.43909 osd.116 up 1.00000 1.00000
124 hdd 7.43909 osd.124 up 1.00000 1.00000
132 hdd 7.43909 osd.132 up 1.00000 1.00000
140 hdd 7.43909 osd.140 up 1.00000 1.00000
-39 767.22302 datacenter xx1
-19 767.22302 rack rack-xx1-0
-2 0 host cepho-p008
-41 365.99744 host cepho-p008-ceph
144 hdd 20.33319 osd.144 up 1.00000 1.00000
145 hdd 20.33319 osd.145 up 1.00000 1.00000
146 hdd 20.33319 osd.146 up 1.00000 1.00000
147 hdd 20.33319 osd.147 up 1.00000 1.00000
148 hdd 20.33319 osd.148 up 1.00000 1.00000
149 hdd 20.33319 osd.149 up 1.00000 1.00000
150 hdd 20.33319 osd.150 up 1.00000 1.00000
151 hdd 20.33319 osd.151 up 1.00000 1.00000
152 hdd 20.33319 osd.152 up 1.00000 1.00000
153 hdd 20.33319 osd.153 up 1.00000 1.00000
154 hdd 20.33319 osd.154 up 1.00000 1.00000
155 hdd 20.33319 osd.155 up 1.00000 1.00000
156 hdd 20.33319 osd.156 up 1.00000 1.00000
157 hdd 20.33319 osd.157 up 1.00000 1.00000
158 hdd 20.33319 osd.158 up 1.00000 1.00000
159 hdd 20.33319 osd.159 up 1.00000 1.00000
160 hdd 20.33319 osd.160 up 1.00000 1.00000
161 hdd 20.33319 osd.161 up 1.00000 1.00000
-5 133.74187 host cepho-p009
102 hdd 7.27739 osd.102 up 1.00000 1.00000
162 hdd 7.43909 osd.162 up 1.00000 1.00000
163 hdd 7.43909 osd.163 up 1.00000 1.00000
164 hdd 7.43909 osd.164 up 1.00000 1.00000
166 hdd 7.43909 osd.166 up 1.00000 1.00000
167 hdd 7.43909 osd.167 up 1.00000 1.00000
168 hdd 7.43909 osd.168 up 1.00000 1.00000
169 hdd 7.43909 osd.169 up 1.00000 1.00000
170 hdd 7.43909 osd.170 up 1.00000 1.00000
171 hdd 7.43909 osd.171 up 1.00000 1.00000
172 hdd 7.43909 osd.172 up 1.00000 1.00000
173 hdd 7.43909 osd.173 up 1.00000 1.00000
174 hdd 7.43909 osd.174 up 1.00000 1.00000
175 hdd 7.43909 osd.175 up 1.00000 1.00000
176 hdd 7.43909 osd.176 up 1.00000 1.00000
177 hdd 7.43909 osd.177 up 1.00000 1.00000
178 hdd 7.43909 osd.178 up 1.00000 1.00000
179 hdd 7.43909 osd.179 up 1.00000 1.00000
-7 133.74187 host cepho-p010
180 hdd 7.43909 osd.180 destroyed 0 1.00000
181 hdd 7.43909 osd.181 up 1.00000 1.00000
183 hdd 7.43909 osd.183 up 1.00000 1.00000
184 hdd 7.43909 osd.184 up 1.00000 1.00000
185 hdd 7.43909 osd.185 up 1.00000 1.00000
186 hdd 7.43909 osd.186 up 1.00000 1.00000
187 hdd 7.43909 osd.187 up 1.00000 1.00000
188 hdd 7.43909 osd.188 up 1.00000 1.00000
189 hdd 7.43909 osd.189 up 1.00000 1.00000
190 hdd 7.43909 osd.190 up 1.00000 1.00000
191 hdd 7.43909 osd.191 up 1.00000 1.00000
192 hdd 7.43909 osd.192 up 1.00000 1.00000
193 hdd 7.43909 osd.193 up 1.00000 1.00000
194 hdd 7.43909 osd.194 up 1.00000 1.00000
195 hdd 7.43909 osd.195 up 1.00000 1.00000
196 hdd 7.43909 osd.196 up 1.00000 1.00000
197 hdd 7.43909 osd.197 up 1.00000 1.00000
218 hdd 7.27739 osd.218 up 1.00000 1.00000
-10 133.74187 host cepho-p011
198 hdd 7.43909 osd.198 up 1.00000 1.00000
199 hdd 7.43909 osd.199 up 1.00000 1.00000
200 hdd 7.43909 osd.200 up 1.00000 1.00000
201 hdd 7.43909 osd.201 up 1.00000 1.00000
203 hdd 7.43909 osd.203 up 1.00000 1.00000
204 hdd 7.43909 osd.204 up 1.00000 1.00000
205 hdd 7.43909 osd.205 up 1.00000 1.00000
206 hdd 7.43909 osd.206 up 1.00000 1.00000
207 hdd 7.43909 osd.207 up 1.00000 1.00000
208 hdd 7.43909 osd.208 up 1.00000 1.00000
209 hdd 7.43909 osd.209 up 1.00000 1.00000
210 hdd 7.43909 osd.210 up 1.00000 1.00000
211 hdd 7.43909 osd.211 up 1.00000 1.00000
212 hdd 7.43909 osd.212 up 1.00000 1.00000
213 hdd 7.43909 osd.213 up 1.00000 1.00000
214 hdd 7.43909 osd.214 up 1.00000 1.00000
215 hdd 7.43909 osd.215 up 1.00000 1.00000
217 hdd 7.27739 osd.217 up 1.00000 1.00000```
|
2024-09-11T13:44:49.896Z | <verdurin> Continued:
``` -8 767.54645 datacenter xx2
-17 267.80713 rack rack-7
-33 133.90356 host cepho-p006
0 hdd 7.43909 osd.0 up 1.00000 1.00000
12 hdd 7.43909 osd.12 up 1.00000 1.00000
21 hdd 7.43909 osd.21 up 1.00000 1.00000
27 hdd 7.43909 osd.27 up 1.00000 1.00000
33 hdd 7.43909 osd.33 up 1.00000 1.00000
43 hdd 7.43909 osd.43 up 1.00000 1.00000
52 hdd 7.43909 osd.52 up 1.00000 1.00000
60 hdd 7.43909 osd.60 up 1.00000 1.00000
68 hdd 7.43909 osd.68 up 1.00000 1.00000
76 hdd 7.43909 osd.76 up 1.00000 1.00000
84 hdd 7.43909 osd.84 up 1.00000 1.00000
92 hdd 7.43909 osd.92 up 1.00000 1.00000
100 hdd 7.43909 osd.100 up 1.00000 1.00000
108 hdd 7.43909 osd.108 up 1.00000 1.00000
117 hdd 7.43909 osd.117 up 1.00000 1.00000
126 hdd 7.43909 osd.126 up 1.00000 1.00000
133 hdd 7.43909 osd.133 up 1.00000 1.00000
141 hdd 7.43909 osd.141 up 1.00000 1.00000
-25 133.90356 host cepho-p007
5 hdd 7.43909 osd.5 up 1.00000 1.00000
9 hdd 7.43909 osd.9 up 1.00000 1.00000
20 hdd 7.43909 osd.20 up 1.00000 1.00000
26 hdd 7.43909 osd.26 up 1.00000 1.00000
36 hdd 7.43909 osd.36 up 1.00000 1.00000
41 hdd 7.43909 osd.41 up 1.00000 1.00000
48 hdd 7.43909 osd.48 up 1.00000 1.00000
56 hdd 7.43909 osd.56 up 1.00000 1.00000
65 hdd 7.43909 osd.65 up 1.00000 1.00000
73 hdd 7.43909 osd.73 up 1.00000 1.00000
80 hdd 7.43909 osd.80 up 1.00000 1.00000
88 hdd 7.43909 osd.88 up 1.00000 1.00000
96 hdd 7.43909 osd.96 up 1.00000 1.00000
104 hdd 7.43909 osd.104 up 1.00000 1.00000
112 hdd 7.43909 osd.112 up 1.00000 1.00000
120 hdd 7.43909 osd.120 up 1.00000 1.00000
128 hdd 7.43909 osd.128 up 1.00000 1.00000
137 hdd 7.43909 osd.137 up 1.00000 1.00000
-14 499.73932 rack rack-8
-37 365.99744 host cepho-p004
6 hdd 20.33319 osd.6 up 1.00000 1.00000
15 hdd 20.33319 osd.15 up 1.00000 1.00000
23 hdd 20.33319 osd.23 up 1.00000 1.00000
31 hdd 20.33319 osd.31 up 1.00000 1.00000
38 hdd 20.33319 osd.38 up 1.00000 1.00000
45 hdd 20.33319 osd.45 up 1.00000 1.00000
53 hdd 20.33319 osd.53 up 1.00000 1.00000
63 hdd 20.33319 osd.63 up 1.00000 1.00000
71 hdd 20.33319 osd.71 up 1.00000 1.00000
79 hdd 20.33319 osd.79 up 1.00000 1.00000
87 hdd 20.33319 osd.87 up 1.00000 1.00000
95 hdd 20.33319 osd.95 up 1.00000 1.00000
110 hdd 20.33319 osd.110 up 1.00000 1.00000
118 hdd 20.33319 osd.118 up 1.00000 1.00000
125 hdd 20.33319 osd.125 up 1.00000 1.00000
134 hdd 20.33319 osd.134 up 1.00000 1.00000
142 hdd 20.33319 osd.142 up 1.00000 1.00000
216 hdd 20.33319 osd.216 up 1.00000 1.00000
-29 133.74187 host cepho-p005
4 hdd 7.43909 osd.4 up 1.00000 1.00000
10 hdd 7.43909 osd.10 up 1.00000 1.00000
18 hdd 7.43909 osd.18 up 1.00000 1.00000
25 hdd 7.43909 osd.25 up 1.00000 1.00000
35 hdd 7.43909 osd.35 up 1.00000 1.00000
44 hdd 7.43909 osd.44 up 1.00000 1.00000
51 hdd 7.43909 osd.51 up 1.00000 1.00000
59 hdd 7.43909 osd.59 up 1.00000 1.00000
66 hdd 7.43909 osd.66 up 1.00000 1.00000
75 hdd 7.43909 osd.75 up 1.00000 1.00000
81 hdd 7.43909 osd.81 up 1.00000 1.00000
91 hdd 7.43909 osd.91 up 1.00000 1.00000
99 hdd 7.43909 osd.99 up 1.00000 1.00000
107 hdd 7.43909 osd.107 up 1.00000 1.00000
115 hdd 7.43909 osd.115 up 1.00000 1.00000
123 hdd 7.43909 osd.123 up 1.00000 1.00000
130 hdd 7.27739 osd.130 up 1.00000 1.00000
138 hdd 7.43909 osd.138 up 1.00000 1.00000``` |
2024-09-11T13:57:41.222Z | <Benard> I am using cephadm to configure some servies, particularly SSL certificates for RGW. I have a yml definition file and when I update that file and apply it I can see that the config in cephadm gets updated, however the haproxy containers are still using the old cert.
Is there a way for me to apply that config properly short of removing and redeploying the service? |
2024-09-11T13:58:53.345Z | <Ken Carlile> I ran into this too, let me check my notes to see if I actually noted how to get around it |
2024-09-11T13:59:09.105Z | <Ken Carlile> I'm just about to be in a meeting, but I'll check afterwards. |
2024-09-11T13:59:52.633Z | <Benard> Thanks Ken |
2024-09-11T14:01:18.082Z | <Ken Carlile> ```When cert is changed, need to change the spec file, apply the spec file, then redeploy the services:
ceph orch apply -i rgw_spec.yaml
ceph orch apply -i rgw-ingress_spec.yaml
ceph orch redeploy rgw.s3rgw
ceph orch redeploy ingress.s3rgw.ingress``` |
2024-09-11T14:01:21.663Z | <Ken Carlile> that's from my notes |
2024-09-11T14:01:29.035Z | <Ken Carlile> so yes, you need to re-apply and then redeploy |
2024-09-11T14:01:40.366Z | <Ken Carlile> at least in my whole vast experience of using this for 2 weeks kinda |
2024-09-11T14:01:44.945Z | <Ken Carlile> it was pretty painless |
2024-09-11T14:02:11.421Z | <Benard> Thanks for confirming Ken 👍 |
2024-09-11T14:02:32.144Z | <Ken Carlile> still waiting for the meeting to start 😛 |
2024-09-11T14:25:24.532Z | <Ken Carlile> No worries. The other gotcha I ran into (and why I know about replacing the cert at all) is that it needs the crt with the full chain. At least in our case. |
2024-09-11T14:33:19.229Z | <Benard> I remember running into this somewhere else, but in my case in this environment it worked fine with just the normal cert 🤷♂️ |
2024-09-11T14:36:16.700Z | <Ken Carlile> Maybe because we use a wildcard cert. Got me, certs are very much a thing that I barely have a handle on understanding. |
2024-09-11T14:38:33.914Z | <Eugen Block> So just to be on the same page: the `ceph orch host ls` output is correct? but in the crush tree, the name cepho-p008 is the correct one (but it's empty) while the OSDs are underneath cepho-p008-ceph. Are you sure you restarted the OSDs on p008 after you fixed the hostname? If not, I'd give that a try, maybe reboot the entire host to ensure the hostname is correct. I expect the OSDs to be reported underneath p008 after a reboot/restart. |
2024-09-11T14:41:21.653Z | <Eugen Block> And if that doesn't work either, you could rename the (currently empty) bucket cepho-p008 (cepho-p008-temp or something): ceph osd crush rename-bucket ...
And then rename cepho-p008-ceph to cepho-p008, that should work, shouldn't it? |
2024-09-11T14:51:03.020Z | <Wes Dillingham> Are these versioned buckets? |
2024-09-11T14:57:24.715Z | <Wes Dillingham> I dont fully understand interplay with versioned buckets but we see large omaps and resharding issues (unable to stay below the omap key threshold for versioned buckets). <https://tracker.ceph.com/issues/67884> |
2024-09-11T15:47:32.256Z | <verdurin> Good point - I could certainly reboot it and see what happens. |
2024-09-11T15:47:36.607Z | <Ken Carlile> Question on service count: is there an issue with running multiple alertmanagers? If I put the host that the single alertmanager is running on maint, dashboard kind of loses its mind. Can't find anything in the docs about this. |
2024-09-11T15:50:11.423Z | <Ken Carlile> I don't have it specified for any particular host, but when setting a host to maintenance mode, it doesn't seem to trigger alertmanager starting on anything else. |
2024-09-11T15:50:45.209Z | <Ken Carlile> For that matter, if I do start alertmanager on another host (by upping the instances to 2), I have to manually reconfigure the dashboard to use the other host. This is the same for prometheus. |
2024-09-11T17:00:31.976Z | <Michael W> We think we tracked down the issue with the Ceph large omap issues. It seems that our Veeam backups are set to immutable (to prevent deletion) and Ceph doesn't like that for handling object replication, so it complains of the large omap issues. Best we were told to do it just mute the alert for it. |
2024-09-11T17:44:42.898Z | <drakonstein> You can also modify these settings to still get warned if you reach a higher value, but this is only really useful if you have a way to reduce the omap size
`osd_deep_scrub_large_omap_object_key_threshold`
`osd_deep_scrub_large_omap_object_value_sum_threshold` |
2024-09-11T17:45:17.802Z | <Wes Dillingham> in our case we quadruped the value of the key threshold |
2024-09-11T17:45:53.865Z | <Wes Dillingham> to 800,000 iirc |
2024-09-11T17:45:57.895Z | <drakonstein> This blog talks about reducing the default value for the key threshold from 2,000,000 to 200,000 [https://42on.com/how-to-handle-large-omap-objects/#:~:text=The%20value%20of%20%E2%80%9Cosd_deep_scrub_large_omap_object_key_threshold%E2%80%9D%20determines,value%20for%20this%20is%20200000](https://42on.com/how-to-handle-large-omap-objects/#:~:text=The%20value%20of%20%E2%80%9Cosd_deep_scrub_large_omap_object_key_threshold%E2%80%9D%20determines,value%20for%20this%20is%20200000). |