ceph - ceph-devel - 2024-08-28

Timestamp (UTC)Message
2024-08-28T07:44:19.491Z
<Venky Shankar> @rzarzynski Could you plan to have a look at <https://tracker.ceph.com/issues/67595> please?
2024-08-28T07:48:11.173Z
<Henrik Korkuc> is it possible <https://tracker.ceph.com/issues/46845> issue reappeared again? I wasn't able to run OSDs (mons, mgrs worked fine) in IPv6 only environment until I set ms_bind_ipv4 to false. Was using cephadm to deploy
2024-08-28T09:07:26.580Z
<Rost Khudov> But I still got a problem with 2 tests:
test_metadata_filter_ampq
```botocore.exceptions.ParamValidationError: Parameter validation failed:
Unknown parameter in NotificationConfiguration.TopicConfigurations[0].Filter: "Metadata", must be one of: Key```
and
test_ps_s3_tags_on_master
```botocore.exceptions.ParamValidationError: Parameter validation failed:
Unknown parameter in NotificationConfiguration.TopicConfigurations[0].Filter: "Tags", must be one of: Key```
is there any reason for that?
2024-08-28T09:08:18.644Z
<Yuval Lifshitz> these test (and other tests as well) are using extensions to the AWS API
2024-08-28T09:09:28.707Z
<Yuval Lifshitz> there are instructions for that here: <https://github.com/ceph/ceph/tree/main/examples/rgw/boto3#users>
2024-08-28T09:09:50.088Z
<Yuval Lifshitz> note that this is not a new thing. and unrelated to the localhost change
2024-08-28T09:11:40.986Z
<Yuval Lifshitz> will add a note about that here: <https://github.com/ceph/ceph/blob/main/src/test/rgw/bucket_notification/README.rst> as well
2024-08-28T09:12:10.349Z
<Rost Khudov> yes, thank you, because it is not clear when you are just running RGW notification tests
2024-08-28T09:27:06.587Z
<Yuval Lifshitz> trying to build "squid" and keep hitting this error:
```librados.so: undefined reference to `Message::encode_otel_trace(ceph::buffer::v15_2_0::list&, unsigned long) const'
librados.so: undefined reference to `Message::decode_otel_trace(ceph::buffer::v15_2_0::list::iterator_impl<true>&, bool)'
librados.so: undefined reference to `fmt::v9::vformat[abi:cxx11](fmt::v9::basic_string_view<char>, fmt::v9::basic_format_args<fmt::v9::basic_format_context<fmt::v9::appender, char> >)'```
any idea?
2024-08-28T09:50:45.595Z
<Rost Khudov> and documentation itself is not really clear; there are no clear steps what and how you have to include this extra file
2024-08-28T09:51:47.915Z
<Yuval Lifshitz> the doc says:

For the standard client to support these extensions, the: service-2.sdk-extras.json file should be placed under: ~/.aws/models/s3/2006-03-01/ directory. For more information see here.
2024-08-28T09:52:08.818Z
<Yuval Lifshitz> you copy service-2.sdk-extras.json to ~/.aws/models/s3/2006-03-01/
2024-08-28T09:53:12.165Z
<Rost Khudov> I think that `.aws` folder exist only when you install awscli and run configure command, but it doesn't exist if you install boto3 with pip
2024-08-28T09:53:44.613Z
<Yuval Lifshitz> ok. so please create it. it would work with boto3 as well
2024-08-28T09:53:52.915Z
<Yuval Lifshitz> will add this to the doc
2024-08-28T10:21:09.141Z
<Rost Khudov> hmm, looks like just creating `~/.aws/models/s3/2006-03-01/` directory and copying json file there are not working with boto3 by default
2024-08-28T10:21:52.782Z
<Rost Khudov> but according to this [doc](https://github.com/boto/botocore/blob/develop/botocore/loaders.py#L33) it should
2024-08-28T10:22:17.717Z
<Yuval Lifshitz> never had issues with that locally or the test machines
2024-08-28T10:22:46.742Z
<Yuval Lifshitz> make sure you use the same user for running the test
2024-08-28T10:31:47.912Z
<Rost Khudov> when I copy to botocore/data/... it works
so should work with .aws folder as well
thank you for the help!
2024-08-28T10:33:42.585Z
<Yuval Lifshitz> interesting. where is this directory?
2024-08-28T10:35:04.344Z
<Rost Khudov> when you install package with pip it goes to `/usr/local/lib/python{version}/site-packages/`
here is the full path:
**`/usr/local/lib/python3.9/site-packages/botocore/data/s3/2006-03-01/service-2.sdk-extras.json`**
2024-08-28T10:35:25.059Z
<Yuval Lifshitz> thanks
2024-08-28T10:36:45.233Z
<Rost Khudov> Can I ask to post link to PR with updated doc here?
Maybe I will be able to add something
2024-08-28T10:38:16.840Z
<Yuval Lifshitz> i tried to summarize in a tracker: <https://tracker.ceph.com/issues/67768>
feel free to add more info there, and you are very welcome to contribute and create a PR to fix the docs
2024-08-28T10:40:34.842Z
<Rost Khudov> I think I have in mind what I want to add to doc, will try to create PR today
2024-08-28T10:59:55.056Z
<Yuval Lifshitz> sounds good. wil lreview once you have that
2024-08-28T11:19:33.825Z
<Matan Breizman> Hey, can you share the issue?
2024-08-28T11:38:10.696Z
<Yonatan Zaken> Sure,
This is the output for `sudo ./install-deps.sh`  on my WSL.

```dh binary
   dh_update_autotools_config
   dh_autoreconf
   create-stamp debian/debhelper-build-stamp
   dh_prep
   dh_auto_install --destdir=debian/ceph-build-deps/
   dh_install
   dh_installdocs
   dh_installchangelogs
   dh_perl
   dh_link
   dh_strip_nondeterminism
   dh_compress
   dh_fixperms
   dh_missing
   dh_dwz
   dh_strip
   dh_makeshlibs
   dh_shlibdeps
   dh_installdeb
   dh_gencontrol
   dh_md5sums
   dh_builddeb
dpkg-deb: error: control directory has bad permissions 777 (must be >=0755 and <=0775)
dh_builddeb: error: dpkg-deb --root-owner-group --build debian/ceph-build-deps .. returned exit code 2
dh_builddeb: error: Aborting due to earlier error
make: *** [debian/rules:3: binary] Error 2
dpkg-buildpackage: error: debian/rules binary subprocess returned exit status 2
Error in the build process: exit status 2
dpkg: error: cannot access archive 'ceph-build-deps_15.2.0-1_amd64.deb': No such file or directory
mk-build-deps: dpkg --unpack failed```

I understood this might be because of umask or fmask values that are set on the mount directory that is used.
Running `umask` i get: 0022

This is the `/etc/wsl.conf`  content I currently have:

```[boot]
systemd=true```
2024-08-28T13:08:37.103Z
<Matan Breizman> Looks WSL specific, did you try this:
<https://www.reddit.com/r/bashonubuntuonwindows/comments/a7v5d8/problems_with_dpkgdeb_bad_permissions_how_do_i/>
2024-08-28T13:21:25.813Z
<Yonatan Zaken> I will try and update, thanks Matan
2024-08-28T15:25:04.568Z
<Casey Bodley> weekly rgw meeting starting soon in [ <https://pad.ceph.com/p/rgw-weekly](https://meet.google.com/mmj-uzzv-qce> )
2024-08-28T16:22:07.712Z
<Casey Bodley> rgw jobs on all releases started failing today with `AssertionError: remote [smithi044.front.sepia.ceph.com](http://smithi044.front.sepia.ceph.com) has osd roles, but no osd devices were specified!`, any idea what changed?
2024-08-28T16:25:10.086Z
<Casey Bodley> ```2024-08-28T16:03:02.534 DEBUG:teuthology.misc:devs=['/dev/vg_nvme/lv_1', '/dev/vg_nvme/lv_2', '/dev/vg_nvme/lv_3', '/dev/vg_nvme/lv_4']
2024-08-28T16:03:02.534 DEBUG:teuthology.orchestra.run.smithi044:> stat /dev/vg_nvme/lv_1
2024-08-28T16:03:02.588 DEBUG:teuthology.orchestra.run:got remote process result: 1
2024-08-28T16:03:02.588 INFO:teuthology.orchestra.run.smithi044.stderr:stat: cannot statx '/dev/vg_nvme/lv_1': No such file or directory
2024-08-28T16:03:02.588 DEBUG:teuthology.misc:get_scratch_devices: /dev/vg_nvme/lv_1 does not exist
2024-08-28T16:03:02.589 INFO:tasks.ceph:osd dev map: {}
2024-08-28T16:03:02.589 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_teuthology_cf5021f85c4b0bf435c74a1183036b3d19af44b5/teuthology/contextutil.py", line 30, in nested
    vars.append(enter())
  File "/usr/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_b3b2fa5e3c1cddde679d8fca5fc24bc1f25fe87a/qa/tasks/ceph.py", line 676, in cluster
    assert roles_to_devs, \
AssertionError: remote [smithi044.front.sepia.ceph.com](http://smithi044.front.sepia.ceph.com) has osd roles, but no osd devices were specified!```
2024-08-28T16:37:30.028Z
<Casey Bodley> it's showing up in the [fs suite](https://pulpito.ceph.com/pdonnell-2024-08-28_16:26:41-fs-wip-pdonnell-testing-20240828.032152-debug-distro-default-smithi/) too, cc @Patrick Donnelly @Venky Shankar
2024-08-28T16:47:08.816Z
<Patrick Donnelly> Yes, very suddenly
2024-08-28T16:48:40.268Z
<Patrick Donnelly> cc @Zack Cerza
2024-08-28T16:49:25.495Z
<Patrick Donnelly> I don't see a recent change to teuthology merged
2024-08-28T17:07:19.518Z
<Dan Mick> Seems more like a change to the job yml would cause this.   Not saying there was one
2024-08-28T17:12:44.882Z
<Casey Bodley> started happening on quincy, squid and main at the same time, so unlikely due to changes in the suite branch
2024-08-28T17:13:25.680Z
<Zack Cerza> got a link to one handy?
2024-08-28T17:13:40.949Z
<Casey Bodley> <https://qa-proxy.ceph.com/teuthology/cbodley-2024-08-28_15:52:13-rgw-wip-67554-squid-distro-default-smithi/7878118/teuthology.log>
2024-08-28T17:14:05.848Z
<Casey Bodley> > ```2024-08-28T16:00:55.653 INFO:teuthology.orchestra.run.smithi105.stderr:stat: cannot statx '/dev/vg_nvme/lv_1': No such file or directory```
2024-08-28T17:19:33.892Z
<Zack Cerza> <https://qa-proxy.ceph.com/teuthology/cbodley-2024-08-28_15:52:13-rgw-wip-67554-squid-distro-default-smithi/7878118/ansible.log>
ansible didn't run at all
2024-08-28T17:33:51.150Z
<Casey Bodley> any idea where that `/etc/ansible/hosts/sepia` file comes from?
2024-08-28T17:35:24.473Z
<Zack Cerza> yeah, ceph-sepia-secrets.git
the last time the file was changed, was right around when sentry saw the first failure: <https://github.com/ceph/ceph-sepia-secrets/pull/904>
2024-08-28T17:37:59.741Z
<Zack Cerza> <https://docs.ansible.com/ansible/latest/inventory_guide/intro_inventory.html#inventory-basics-formats-hosts-and-groups>: "Group names should follow the same guidelines as [Creating valid variable names](https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_variables.html#valid-variable-names)."
<https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_variables.html#valid-variable-names>: "A variable name cannot begin with a number"
<https://github.com/ceph/ceph-sepia-secrets/pull/904/files#diff-6b0046333530400164979089341a41bd1f3459ad4f68d8e138e1c39f5fed13ccR734>: "2_jenkins_builders"
2024-08-28T17:48:19.517Z
<Zack Cerza> I self-merged this, which should resolve the issue
2024-08-28T17:48:29.612Z
<Zack Cerza> I self-merged this, which should resolve the issue: <https://github.com/ceph/ceph-sepia-secrets/pull/920>
2024-08-28T17:50:01.263Z
<Casey Bodley> thanks, rescheduling to test
2024-08-28T18:02:08.019Z
<Casey Bodley> @Zack Cerza still faling, ex [teuthology.log](https://qa-proxy.ceph.com/teuthology/cbodley-2024-08-28_17:49:24-rgw-wip-67554-squid-distro-default-smithi/7878301/teuthology.log) and [ansible.log](https://qa-proxy.ceph.com/teuthology/cbodley-2024-08-28_17:49:24-rgw-wip-67554-squid-distro-default-smithi/7878301/ansible.log)
2024-08-28T18:04:17.354Z
<Casey Bodley> `jenkins_builders` is listed under `[jenkins_builders:children]`
2024-08-28T18:15:07.489Z
<Zack Cerza> off
2024-08-28T18:15:12.565Z
<Zack Cerza> oof
2024-08-28T18:15:23.389Z
<Zack Cerza> ok, I think I have a fix for that too
2024-08-28T18:29:23.410Z
<Dan Mick> thanks @Zack Cerza
2024-08-28T20:33:05.635Z
<Yonatan Zaken> Thanks this worked for me. For wsl users make sure to relaunch your wsl after editing the wsl.conf as in the link above 🙂
2024-08-28T20:56:12.541Z
<Samuel Just> Probably need to install grpc-devel package -- I don't think the dependencies were updated
2024-08-28T20:58:01.976Z
<Frank Filz> I have grpc-devel installed... It's been suggested to build without NVME so I
2024-08-28T20:58:14.351Z
<Frank Filz> I have grpc-devel installed... It's been suggested to build without NVME so I've done that.
2024-08-28T21:03:38.198Z
<Samuel Just> Hmm, ok
2024-08-28T22:53:40.738Z
<Dan Mick> oh **scratch devices**, I remember those

Any issue? please create an issue here and use the infra label.