ceph - ceph-devel - 2024-08-28

Timestamp (UTC)	Message
2024-08-28T07:44:19.491Z	<Venky Shankar> @rzarzynski Could you plan to have a look at <https://tracker.ceph.com/issues/67595> please?
2024-08-28T07:48:11.173Z	<Henrik Korkuc> is it possible <https://tracker.ceph.com/issues/46845> issue reappeared again? I wasn't able to run OSDs (mons, mgrs worked fine) in IPv6 only environment until I set ms_bind_ipv4 to false. Was using cephadm to deploy
2024-08-28T09:07:26.580Z	<Rost Khudov> But I still got a problem with 2 tests: test_metadata_filter_ampq ```botocore.exceptions.ParamValidationError: Parameter validation failed: Unknown parameter in NotificationConfiguration.TopicConfigurations[0].Filter: "Metadata", must be one of: Key``` and test_ps_s3_tags_on_master ```botocore.exceptions.ParamValidationError: Parameter validation failed: Unknown parameter in NotificationConfiguration.TopicConfigurations[0].Filter: "Tags", must be one of: Key``` is there any reason for that?
2024-08-28T09:08:18.644Z	<Yuval Lifshitz> these test (and other tests as well) are using extensions to the AWS API
2024-08-28T09:09:28.707Z	<Yuval Lifshitz> there are instructions for that here: <https://github.com/ceph/ceph/tree/main/examples/rgw/boto3#users>
2024-08-28T09:09:50.088Z	<Yuval Lifshitz> note that this is not a new thing. and unrelated to the localhost change
2024-08-28T09:11:40.986Z	<Yuval Lifshitz> will add a note about that here: <https://github.com/ceph/ceph/blob/main/src/test/rgw/bucket_notification/README.rst> as well
2024-08-28T09:12:10.349Z	<Rost Khudov> yes, thank you, because it is not clear when you are just running RGW notification tests
2024-08-28T09:27:06.587Z	<Yuval Lifshitz> trying to build "squid" and keep hitting this error: ```librados.so: undefined reference to `Message::encode_otel_trace(ceph::buffer::v15_2_0::list&, unsigned long) const' librados.so: undefined reference to `Message::decode_otel_trace(ceph::buffer::v15_2_0::list::iterator_impl<true>&, bool)' librados.so: undefined reference to `fmt::v9::vformat[abi:cxx11](fmt::v9::basic_string_view<char>, fmt::v9::basic_format_args<fmt::v9::basic_format_context<fmt::v9::appender, char> >)'``` any idea?
2024-08-28T09:50:45.595Z	<Rost Khudov> and documentation itself is not really clear; there are no clear steps what and how you have to include this extra file
2024-08-28T09:51:47.915Z	<Yuval Lifshitz> the doc says: For the standard client to support these extensions, the: service-2.sdk-extras.json file should be placed under: ~/.aws/models/s3/2006-03-01/ directory. For more information see here.
2024-08-28T09:52:08.818Z	<Yuval Lifshitz> you copy service-2.sdk-extras.json to ~/.aws/models/s3/2006-03-01/
2024-08-28T09:53:12.165Z	<Rost Khudov> I think that `.aws` folder exist only when you install awscli and run configure command, but it doesn't exist if you install boto3 with pip
2024-08-28T09:53:44.613Z	<Yuval Lifshitz> ok. so please create it. it would work with boto3 as well
2024-08-28T09:53:52.915Z	<Yuval Lifshitz> will add this to the doc
2024-08-28T10:21:09.141Z	<Rost Khudov> hmm, looks like just creating `~/.aws/models/s3/2006-03-01/` directory and copying json file there are not working with boto3 by default
2024-08-28T10:21:52.782Z	<Rost Khudov> but according to this [doc](https://github.com/boto/botocore/blob/develop/botocore/loaders.py#L33) it should
2024-08-28T10:22:17.717Z	<Yuval Lifshitz> never had issues with that locally or the test machines
2024-08-28T10:22:46.742Z	<Yuval Lifshitz> make sure you use the same user for running the test
2024-08-28T10:31:47.912Z	<Rost Khudov> when I copy to botocore/data/... it works so should work with .aws folder as well thank you for the help!
2024-08-28T10:33:42.585Z	<Yuval Lifshitz> interesting. where is this directory?
2024-08-28T10:35:04.344Z	<Rost Khudov> when you install package with pip it goes to `/usr/local/lib/python{version}/site-packages/` here is the full path: `/usr/local/lib/python3.9/site-packages/botocore/data/s3/2006-03-01/service-2.sdk-extras.json`
2024-08-28T10:35:25.059Z	<Yuval Lifshitz> thanks
2024-08-28T10:36:45.233Z	<Rost Khudov> Can I ask to post link to PR with updated doc here? Maybe I will be able to add something
2024-08-28T10:38:16.840Z	<Yuval Lifshitz> i tried to summarize in a tracker: <https://tracker.ceph.com/issues/67768> feel free to add more info there, and you are very welcome to contribute and create a PR to fix the docs
2024-08-28T10:40:34.842Z	<Rost Khudov> I think I have in mind what I want to add to doc, will try to create PR today
2024-08-28T10:59:55.056Z	<Yuval Lifshitz> sounds good. wil lreview once you have that
2024-08-28T11:19:33.825Z	<Matan Breizman> Hey, can you share the issue?
2024-08-28T11:38:10.696Z	<Yonatan Zaken> Sure, This is the output for `sudo ./install-deps.sh` on my WSL. ```dh binary dh_update_autotools_config dh_autoreconf create-stamp debian/debhelper-build-stamp dh_prep dh_auto_install --destdir=debian/ceph-build-deps/ dh_install dh_installdocs dh_installchangelogs dh_perl dh_link dh_strip_nondeterminism dh_compress dh_fixperms dh_missing dh_dwz dh_strip dh_makeshlibs dh_shlibdeps dh_installdeb dh_gencontrol dh_md5sums dh_builddeb dpkg-deb: error: control directory has bad permissions 777 (must be >=0755 and <=0775) dh_builddeb: error: dpkg-deb --root-owner-group --build debian/ceph-build-deps .. returned exit code 2 dh_builddeb: error: Aborting due to earlier error make: *** [debian/rules:3: binary] Error 2 dpkg-buildpackage: error: debian/rules binary subprocess returned exit status 2 Error in the build process: exit status 2 dpkg: error: cannot access archive 'ceph-build-deps_15.2.0-1_amd64.deb': No such file or directory mk-build-deps: dpkg --unpack failed``` I understood this might be because of umask or fmask values that are set on the mount directory that is used. Running `umask` i get: 0022 This is the `/etc/wsl.conf` content I currently have: ```[boot] systemd=true```
2024-08-28T13:08:37.103Z	<Matan Breizman> Looks WSL specific, did you try this: <https://www.reddit.com/r/bashonubuntuonwindows/comments/a7v5d8/problems_with_dpkgdeb_bad_permissions_how_do_i/>
2024-08-28T13:21:25.813Z	<Yonatan Zaken> I will try and update, thanks Matan
2024-08-28T15:25:04.568Z	<Casey Bodley> weekly rgw meeting starting soon in [ <https://pad.ceph.com/p/rgw-weekly](https://meet.google.com/mmj-uzzv-qce> )
2024-08-28T16:22:07.712Z	<Casey Bodley> rgw jobs on all releases started failing today with `AssertionError: remote [smithi044.front.sepia.ceph.com](http://smithi044.front.sepia.ceph.com) has osd roles, but no osd devices were specified!`, any idea what changed?
2024-08-28T16:25:10.086Z	<Casey Bodley> ```2024-08-28T16:03:02.534 DEBUG:teuthology.misc:devs=['/dev/vg_nvme/lv_1', '/dev/vg_nvme/lv_2', '/dev/vg_nvme/lv_3', '/dev/vg_nvme/lv_4'] 2024-08-28T16:03:02.534 DEBUG:teuthology.orchestra.run.smithi044:> stat /dev/vg_nvme/lv_1 2024-08-28T16:03:02.588 DEBUG:teuthology.orchestra.run:got remote process result: 1 2024-08-28T16:03:02.588 INFO:teuthology.orchestra.run.smithi044.stderr:stat: cannot statx '/dev/vg_nvme/lv_1': No such file or directory 2024-08-28T16:03:02.588 DEBUG:teuthology.misc:get_scratch_devices: /dev/vg_nvme/lv_1 does not exist 2024-08-28T16:03:02.589 INFO:tasks.ceph:osd dev map: {} 2024-08-28T16:03:02.589 ERROR:teuthology.contextutil:Saw exception from nested tasks Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_teuthology_cf5021f85c4b0bf435c74a1183036b3d19af44b5/teuthology/contextutil.py", line 30, in nested vars.append(enter()) File "/usr/lib/python3.10/contextlib.py", line 135, in __enter__ return next(self.gen) File "/home/teuthworker/src/git.ceph.com_ceph-c_b3b2fa5e3c1cddde679d8fca5fc24bc1f25fe87a/qa/tasks/ceph.py", line 676, in cluster assert roles_to_devs, \ AssertionError: remote [smithi044.front.sepia.ceph.com](http://smithi044.front.sepia.ceph.com) has osd roles, but no osd devices were specified!```
2024-08-28T16:37:30.028Z	<Casey Bodley> it's showing up in the [fs suite](https://pulpito.ceph.com/pdonnell-2024-08-28_16:26:41-fs-wip-pdonnell-testing-20240828.032152-debug-distro-default-smithi/) too, cc @Patrick Donnelly @Venky Shankar
2024-08-28T16:47:08.816Z	<Patrick Donnelly> Yes, very suddenly
2024-08-28T16:48:40.268Z	<Patrick Donnelly> cc @Zack Cerza
2024-08-28T16:49:25.495Z	<Patrick Donnelly> I don't see a recent change to teuthology merged
2024-08-28T17:07:19.518Z	<Dan Mick> Seems more like a change to the job yml would cause this. Not saying there was one
2024-08-28T17:12:44.882Z	<Casey Bodley> started happening on quincy, squid and main at the same time, so unlikely due to changes in the suite branch
2024-08-28T17:13:25.680Z	<Zack Cerza> got a link to one handy?
2024-08-28T17:13:40.949Z	<Casey Bodley> <https://qa-proxy.ceph.com/teuthology/cbodley-2024-08-28_15:52:13-rgw-wip-67554-squid-distro-default-smithi/7878118/teuthology.log>
2024-08-28T17:14:05.848Z	<Casey Bodley> > ```2024-08-28T16:00:55.653 INFO:teuthology.orchestra.run.smithi105.stderr:stat: cannot statx '/dev/vg_nvme/lv_1': No such file or directory```
2024-08-28T17:19:33.892Z	<Zack Cerza> <https://qa-proxy.ceph.com/teuthology/cbodley-2024-08-28_15:52:13-rgw-wip-67554-squid-distro-default-smithi/7878118/ansible.log> ansible didn't run at all
2024-08-28T17:33:51.150Z	<Casey Bodley> any idea where that `/etc/ansible/hosts/sepia` file comes from?
2024-08-28T17:35:24.473Z	<Zack Cerza> yeah, ceph-sepia-secrets.git the last time the file was changed, was right around when sentry saw the first failure: <https://github.com/ceph/ceph-sepia-secrets/pull/904>
2024-08-28T17:37:59.741Z	<Zack Cerza> <https://docs.ansible.com/ansible/latest/inventory_guide/intro_inventory.html#inventory-basics-formats-hosts-and-groups>: "Group names should follow the same guidelines as [Creating valid variable names](https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_variables.html#valid-variable-names)." <https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_variables.html#valid-variable-names>: "A variable name cannot begin with a number" <https://github.com/ceph/ceph-sepia-secrets/pull/904/files#diff-6b0046333530400164979089341a41bd1f3459ad4f68d8e138e1c39f5fed13ccR734>: "2_jenkins_builders"
2024-08-28T17:48:19.517Z	<Zack Cerza> I self-merged this, which should resolve the issue
2024-08-28T17:48:29.612Z	<Zack Cerza> I self-merged this, which should resolve the issue: <https://github.com/ceph/ceph-sepia-secrets/pull/920>
2024-08-28T17:50:01.263Z	<Casey Bodley> thanks, rescheduling to test
2024-08-28T18:02:08.019Z	<Casey Bodley> @Zack Cerza still faling, ex [teuthology.log](https://qa-proxy.ceph.com/teuthology/cbodley-2024-08-28_17:49:24-rgw-wip-67554-squid-distro-default-smithi/7878301/teuthology.log) and [ansible.log](https://qa-proxy.ceph.com/teuthology/cbodley-2024-08-28_17:49:24-rgw-wip-67554-squid-distro-default-smithi/7878301/ansible.log)
2024-08-28T18:04:17.354Z	<Casey Bodley> `jenkins_builders` is listed under `[jenkins_builders:children]`
2024-08-28T18:15:07.489Z	<Zack Cerza> off
2024-08-28T18:15:12.565Z	<Zack Cerza> oof
2024-08-28T18:15:23.389Z	<Zack Cerza> ok, I think I have a fix for that too
2024-08-28T18:29:23.410Z	<Dan Mick> thanks @Zack Cerza
2024-08-28T20:33:05.635Z	<Yonatan Zaken> Thanks this worked for me. For wsl users make sure to relaunch your wsl after editing the wsl.conf as in the link above 🙂
2024-08-28T20:56:12.541Z	<Samuel Just> Probably need to install grpc-devel package -- I don't think the dependencies were updated
2024-08-28T20:58:01.976Z	<Frank Filz> I have grpc-devel installed... It's been suggested to build without NVME so I
2024-08-28T20:58:14.351Z	<Frank Filz> I have grpc-devel installed... It's been suggested to build without NVME so I've done that.
2024-08-28T21:03:38.198Z	<Samuel Just> Hmm, ok
2024-08-28T22:53:40.738Z	<Dan Mick> oh scratch devices, I remember those

ceph - ceph-devel - 2024-08-28

Any issue? please create an issue here and use the infra label.