ceph - ceph-devel - 2024-11-28

Timestamp (UTC)Message
2024-11-28T04:14:08.890Z
<bl___> Dan Mick: example error: `copying unsafe symlink "rpm-15.2.1" -> "/overflow/rpm-15.2.1"` + `symlink has no referent: "/rpm-15.2.1" (in ceph)`
2024-11-28T04:16:06.143Z
<bl___> I don't know if this one is from my side or server side (on my side of course I won't allow writes to root dir): `rsync: opendir "/rpm-17.2.8" (in ceph) failed: Permission denied (13)`
2024-11-28T04:17:33.442Z
<bl___> hmpf maybe wrong permissions on that one in the server side? my permissions look OK...
2024-11-28T04:26:52.025Z
<bl___> yeah the RPM repo after rsync seems empty in /rpm-17.2.8 and this breaks all clients using `rpm-quincy` as repo for Ceph because the symlink is updated to 17.2.8 which cannot be rsynced, please check permissions on this?
2024-11-28T04:50:48.673Z
<bl___> This permission problem has broken all mirrors around the world, example: https://mirror.csclub.uwaterloo.ca/ceph/rpm-quincy/
2024-11-28T04:51:41.080Z
<bl___> https://ftp.gwdg.de/pub/misc/ceph/rpm-quincy/
2024-11-28T04:52:38.474Z
<bl___> http://ftp2.de.freebsd.org/pub/misc/ceph/rpm-quincy/ and so on
2024-11-28T06:10:41.900Z
<nizamial09> Hi, I am trying to add `colorama` dependency to the [mgr's python requirements file](https://github.com/ceph/githubcheck). I've added the `python3-colorama` dependency in the `ceph.spec` and `debian/control` files so it can pass the API and make checks but the[ docs test keeps failing ](https://jenkins.ceph.com/job/ceph-pr-docs/109937/console)saying the colorama module not found. any idea on where I should actually add the dependency for this to work?

PR: <https://github.com/ceph/ceph/pull/60749/>

Additionally I see `The full traceback has been saved in /tmp/sphinx-err-c1umhqlg.log,` in log but I suppose I need to access the machine inorder to view the file. Maybe the log can be exported so anyone checking the jenkins log can download the log artifact and see the actual error?
```updating environment: [new config] 554 added, 0 changed, 0 removed
reading sources... [  0%] api/index
reading sources... [  0%] api/mon_command_api
loading mgr module 'alerts'...

Exception occurred:
  File "/home/jenkins-build/build/workspace/ceph-pr-docs/src/pybind/mgr/mgr_util.py", line 2, in <module>
    from colorama import Fore, Style
ModuleNotFoundError: No module named 'colorama'
The full traceback has been saved in /tmp/sphinx-err-c1umhqlg.log, if you want to report the issue to the developers.```
2024-11-28T06:37:45.288Z
<Rico> thanks for your feedback. wait & see ...
2024-11-28T09:24:32.608Z
<Yonatan Zaken> Hi All,
We are currently troubleshooting a ceph related networking issue on one of our setups.

We are using ceph squid 19.2.0
Network configuration is IPv6
```global                                                     advanced  ms_bind_ipv4                                                           false
global                                                     advanced  ms_bind_ipv6                                                           true
global                                                     advanced  cluster_network                                                        ceff🅱️:0/120                                              *
global                                                     advanced  public_network                                                         ceff🅰️:0/120                                              *
mon                                                        advanced  public_network                                                         ceff🅰️:0/120                                              *```
After ceph installation we have this following health warning:
```[root@bvt-centralsite-centralsitemanager-0 log (Active)]# ceph health detail
[ERR] OSD_UNREACHABLE: 2 osds(s) are not reachable
    osd.0's public address is not in 'ceff🅰️:/120' subnet
    osd.1's public address is not in 'ceff🅰️:/120' subnet
    osd.2's public address is not in 'ceff🅰️:/120' subnet```
Inspecting the public_network fields of all OSD's we haven't seen any issue with the configured values.

From the monitor logs we have noticed the following error:
```2024-11-27T07:06:31.658-0800 7f11b0093640 -1 unable to convert chosen address to string: ceff🅰️:2```
Which led us to inspect the ceph code searching for this error message and got to the method `is_addr_in_subnet`  in `[pick_address.cc](http://pick_address.cc)`
Essentially we understand that this method is invoked from the `OSDMap::check_health`

`is_addr_in_subnet`  is making a call to:
```if(inet_pton(AF_INET, addr.c_str(), &public_addr.sin_addr) != 1) {
    lderr(cct) << "unable to convert chosen address to string: " << addr << dendl;
    return false;
  }```
Which makes us believe that this code doesn't support IPv6 addresses as the address family is `AF_INET` (IPv4)

Do you consider this a bug?
2024-11-28T09:25:52.322Z
<IcePic> Hey, I made a cluster yesterday (ipv4 only) and also get weird "not reachable" ceom ceph -s
2024-11-28T09:27:08.503Z
<IcePic> even though rbd bench can talk nicely to the pools. I did have a mismatch on public_network for a moment during install, but I fixed it and restarted all daemons, but it still complains that all my OSDs are supposedly unreachable, but they do work, and "ceph osd dump" shows they all have correct ips in the range now
2024-11-28T09:30:12.999Z
<Yonatan Zaken> Same here, cluster seems to be functioning properly, but complains about OSD's not being reachable. However we used ipv6
2024-11-28T09:30:40.068Z
<IcePic> 17.2.8 here, so not the same release though
2024-11-28T09:31:26.006Z
<IcePic> but its weird, I have checked and double-checked the public network list on the config against all OSDs, so its as if there is a third hidden place where "bad" info is still stored
2024-11-28T09:31:59.210Z
<Matan Breizman> Can you please submit a tracker issue?
2024-11-28T09:34:31.134Z
<IcePic> Yonathans issue with 19 and ipv6 seems to exist as https://tracker.ceph.com/issues/68392
2024-11-28T09:37:09.968Z
<IcePic> this looks more like my issue, we also have several networks in the list and the OSD network is last out of 4 nets. https://tracker.ceph.com/issues/65186
2024-11-28T09:37:40.796Z
<IcePic> though 65186 seems to have been merged this summer
2024-11-28T09:37:45.942Z
<Yonatan Zaken> Sure, np
2024-11-28T09:38:20.062Z
<IcePic> I could reorder the ceph.conf public_networks to see if 17.2.8 for some reason only looks at first entry.
2024-11-28T09:38:33.273Z
<IcePic> we have a short window before we need to put data on it
2024-11-28T09:41:12.178Z
<IcePic> yes, that fixed it
2024-11-28T09:41:26.009Z
<IcePic> also, it was when I restarted my mgrs that the error went away.
2024-11-28T09:41:50.637Z
<IcePic> so "public_network" wants/needs the net OSDs are on first, and then restart mgrs for error to go away
2024-11-28T09:42:22.837Z
<IcePic> I will comment in 65186
2024-11-28T09:43:41.126Z
<Yonatan Zaken> Should I add my description here: <https://tracker.ceph.com/issues/68392>, or open a new one?
2024-11-28T09:44:29.369Z
<Matan Breizman> Seems like it's the same issue so adding there should do, thanks again. I'll try to bring this up
2024-11-28T09:44:40.645Z
<Yonatan Zaken> Thanks
2024-11-28T09:47:37.034Z
<IcePic> Matan: Thanks for your help in sending me to search in the issue tracker for my own error, it helped me at least.
2024-11-28T09:48:46.229Z
<Yonatan Zaken> Hi All,
We are currently troubleshooting a ceph related networking issue on one of our setups.

We are using ceph squid 19.2.0
Network configuration is IPv6
```global                                                     advanced  ms_bind_ipv4                                                           false
global                                                     advanced  ms_bind_ipv6                                                           true
global                                                     advanced  cluster_network                                                        ceff🅱️:0/120                                              *
global                                                     advanced  public_network                                                         ceff🅰️:0/120                                              *
mon                                                        advanced  public_network                                                         ceff🅰️:0/120                                              *```
After ceph installation we have this following health warning:
```[root@bvt-centralsite-centralsitemanager-0 log (Active)]# ceph health detail
[ERR] OSD_UNREACHABLE: 3 osds(s) are not reachable
    osd.0's public address is not in 'ceff🅰️:/120' subnet
    osd.1's public address is not in 'ceff🅰️:/120' subnet
    osd.2's public address is not in 'ceff🅰️:/120' subnet```
Inspecting the public_network fields of all OSD's we haven't seen any issue with the configured values.

From the monitor logs we have noticed the following error:
```2024-11-27T07:06:31.658-0800 7f11b0093640 -1 unable to convert chosen address to string: ceff🅰️:2```
Which led us to inspect the ceph code searching for this error message and got to the method `is_addr_in_subnet`  in `pick_address.cc`
Essentially we understand that this method is invoked from the `OSDMap::check_health`

`is_addr_in_subnet`  is making a call to:
```if(inet_pton(AF_INET, addr.c_str(), &public_addr.sin_addr) != 1) {
    lderr(cct) << "unable to convert chosen address to string: " << addr << dendl;
    return false;
  }```
Which makes us believe that this code doesn't support IPv6 addresses as the address family is `AF_INET` (IPv4)

Do you consider this a bug?
2024-11-28T09:52:18.686Z
<Nitzan Mordechai> I'll take a look, thanks
2024-11-28T09:56:47.176Z
<Yonatan Zaken> Added my notes here:
<https://tracker.ceph.com/issues/68392>
I see a very similar description was reported here:
<https://tracker.ceph.com/issues/67517>
2024-11-28T09:58:28.242Z
<Matan Breizman> Thank you for reporting!
2024-11-28T11:59:18.069Z
<BrianP> _Could I get my tracker account enabled? Thanks._
2024-11-28T13:20:04.145Z
<Fredolin B Brone> Hi all,
   This is my first time building the ceph. I successfully cloned the git and done with the Prerequisites, but an error occurs on running the `./do_cmake.sh`  command. The error looks as follows:
```+ cmake3 -GNinja -DWITH_PYTHON3=3.9 -DCMAKE_CXX_COMPILER=g++ -DCMAKE_C_COMPILER=gcc ..
cmake3: symbol lookup error: /lib64/libldap.so.2: undefined symbol: EVP_md2, version OPENSSL_3.0.0```
Can you please look into this and help me out?
2024-11-28T14:11:44.274Z
<IcePic> Fredolin: Seems your openssl was compiled without md2 support
2024-11-28T14:12:25.663Z
<IcePic> Fredolin: Looks identical to this error report: https://github.com/openssl/openssl/discussions/23639
2024-11-28T15:56:23.738Z
<zigo> Casey Bodley: Will you be at the CERN Cephalocon next week?
2024-11-28T15:56:45.854Z
<zigo> I'd be happy to meet you, and introduce you with my friend Daniel, anso maintaining Ceph in Debian.
2024-11-28T15:57:09.990Z
<zigo> And what about Keifu? Will he be there?

Any issue? please create an issue here and use the infra label.