ceph - sepia - 2024-06-11

Timestamp (UTC)	Message
2024-06-11T13:12:17.608Z	<Leonid Usov> Redmine is slow - tried running a bulk operation on 8 issues. Loading for a long time. No crashes so far, but looks like it’s about to
2024-06-11T15:16:44.734Z	<Casey Bodley> getting several dead jobs per run due to `Error reimaging machines: Failed to power on smithi...` lately
2024-06-11T15:17:55.503Z	<Casey Bodley> reopening <https://tracker.ceph.com/issues/61163> to track it
2024-06-11T15:18:03.712Z	<yuriw> me too
2024-06-11T16:34:27.892Z	<Samuel Just> <https://tracker.ceph.com/issues/66339> Some log files accessed via the [qa-proxy.ceph.com](http://qa-proxy.ceph.com) seem to be double-gzipped. Accessing the same file via cephfs directly behaves normally, so the behavior seems to be in the webserver. I see this behavior whether downloading using a browser or wget. Any ideas?
2024-06-11T16:37:59.820Z	<Casey Bodley> i've had to work around this for years by using `wget` instead of the browser to download logs from qa-proxy
2024-06-11T16:39:32.289Z	<Samuel Just> It's happening for me with wget as well.
2024-06-11T16:39:37.763Z	<Casey Bodley> interesting, that's new
2024-06-11T16:39:42.306Z	<Samuel Just> wget is normally how I get logs.
2024-06-11T16:41:04.674Z	<Samuel Just> Started happening for me about two weeks ago.
2024-06-11T16:46:33.654Z	<Samuel Just> wget --no-compression seems to avoid the problem
2024-06-11T16:47:08.324Z	<Casey Bodley> wget v1.21.4 is working for me
2024-06-11T16:48:05.673Z	<Samuel Just> wget 2.1.0 here -- note that it only seems to happen on some files
2024-06-11T16:48:23.574Z	<Samuel Just> ...I did upgrade my workstation recently to fedora 40 -- might be related
2024-06-11T16:49:04.802Z	<Samuel Just> If it's a bug on my side, it's shared by firefox and wget, though.
2024-06-11T16:50:12.943Z	<Samuel Just> Not chrome, though. Must be a library shared between firefox and wget
2024-06-11T17:05:32.750Z	<Zack Cerza> interesting behavior, I wasn't aware of this. How many years do you think it's been, Casey? might help track down what caused the change
2024-06-11T17:06:20.951Z	<Samuel Just> FWIW, I'm now pretty sure the problem is something on my workstation not honoring a compression related header.
2024-06-11T17:06:31.313Z	<Casey Bodley> hard to guess how long it's been. but i vaguely recall discussing it with you, Zack
2024-06-11T17:06:52.226Z	<Samuel Just> ~~<https://tracker.ceph.com/issues/66339> Some log files accessed via the [qa-proxy.ceph.com](http://qa-proxy.ceph.com) seem to be double-gzipped. Accessing the same file via cephfs directly behaves normally, so the behavior seems to be in the webserver. I see this behavior whether downloading using a browser or wget. Any ideas?~~ I think the problem is something on my workstation not honoring a compression header, nvm.
2024-06-11T17:09:10.181Z	<Zack Cerza> that's possible, I do forget things
2024-06-11T17:09:28.203Z	<Casey Bodley> Sam, `wget --no-compression` is probably causing the client to omit or send a different `Accept-Encoding` request header
2024-06-11T17:09:49.861Z	<Casey Bodley> that would cause the server to reply differently
2024-06-11T17:09:55.526Z	<Samuel Just> Right, thus avoiding whatever encoding/header issue wget is hitting.
2024-06-11T17:10:01.492Z	<Casey Bodley> --debug will dump headers
2024-06-11T17:10:15.009Z	<Zack Cerza> looking around a bit, I'm suspicious of this: [https://github.com/ceph/ceph-cm-ansible/commit/7008bc96184302a9cae5d6ca17d2ef5f75189415#diff-5d0681ce0b32f4f134bbd3a[…]9e0644ab82f474d74f0546667e6R8](https://github.com/ceph/ceph-cm-ansible/commit/7008bc96184302a9cae5d6ca17d2ef5f75189415#diff-5d0681ce0b32f4f134bbd3aae92af9978ad679e0644ab82f474d74f0546667e6R8)
2024-06-11T17:10:42.980Z	<Samuel Just> I mean, it seems totally reasonable to allow gzipped responses -- there are some really big plain text files.
2024-06-11T17:11:29.162Z	<Casey Bodley> yeah but the server shouldn't be re-compressing .gz files
2024-06-11T17:11:33.738Z	<Samuel Just> I was looking into this originally because I feared I was seeing corrupted data in the sepia cephfs cluster after the upgrade. Now that it seems to be specific to something on my workstation in fedora 40, I'm a lot less worried
2024-06-11T17:11:35.514Z	<Samuel Just> that's true
2024-06-11T17:12:07.084Z	<Zack Cerza> yeah it's clear, at least for the files i happened to look at, that they're compressed already on-disk
2024-06-11T17:12:34.142Z	<Samuel Just> I haven't observed anything that wasn't already gzipped showing this behavior
2024-06-11T17:25:22.259Z	<Zack Cerza> yeah the nginx setting doesn't seem like the culprit. have you seen it on files whose uncompressed size is <128M? wondering about: <https://github.com/ceph/teuthology/commit/15af4a245af1c2b774fa0ac3aa03c4c1d00b46df>
2024-06-11T17:28:53.035Z	<Samuel Just> ~~<https://tracker.ceph.com/issues/66339> Some log files accessed via the [qa-proxy.ceph.com](http://qa-proxy.ceph.com) seem to be double-gzipped. Accessing the same file via cephfs directly behaves normally, so the behavior seems to be in the webserver. I see this behavior whether downloading using a browser or wget. Any ideas?~~ I think the problem is something on my workstation not honoring a compression header, nvm.
2024-06-11T18:09:21.836Z	<Rishabh Dave> same
2024-06-11T18:19:04.279Z	<Rishabh Dave> Hi all, I am still seeing this issue. Doesn't look like it is related due centos8 fallout or connection loss. I've used SHA1 `4fb7c8eca39e688af307c2a6f2b70f15f122a606` , which leads to error - ```$ teuthology-suite -p 50 --force-priority -m smithi -k testing -s fs:functional -c quincy -S 4fb7c8eca39e688af307c2a6f2b70f15f122a606 -l 1 ... teuthology.exceptions.ScheduleFailError: Scheduling rishabh-2024-06-11_18:07:44-fs:functional-quincy-testing-default-smithi failed: '4fb7c8eca39e688af307c2a6f2b70f15f122a606' not found in repo: ceph-ci.git!``` BUT using old SHA1 `658e3c7068357222a961b3107ed1c91a5ab3a893` is working fine - ```$ teuthology-suite -p 80 --force-priority -m smithi -k testing -s fs:functional -c quincy -S 658e3c7068357222a961b3107ed1c91a5ab3a893 -l 1 --dry-run ... 2024-06-11 18:15:21,726.726 INFO:teuthology.suite.run:Test results viewable at <https://pulpito.ceph.com/rishabh-2024-06-11_18:15:19-fs:functional-quincy-testing-default-smithi/>```
2024-06-11T18:19:11.716Z	<Rishabh Dave> Hi all, I am still seeing this issue. Doesn't look like it is related due centos8 fallout or connection loss. I've used SHA1 `4fb7c8eca39e688af307c2a6f2b70f15f122a606` , which leads to error - ```$ teuthology-suite -p 50 --force-priority -m smithi -k testing -s fs:functional -c quincy -S 4fb7c8eca39e688af307c2a6f2b70f15f122a606 -l 1 ... teuthology.exceptions.ScheduleFailError: Scheduling rishabh-2024-06-11_18:07:44-fs:functional-quincy-testing-default-smithi failed: '4fb7c8eca39e688af307c2a6f2b70f15f122a606' not found in repo: ceph-ci.git!``` BUT using old SHA1 `658e3c7068357222a961b3107ed1c91a5ab3a893` is working fine - ```$ teuthology-suite -p 80 --force-priority -m smithi -k testing -s fs:functional -c quincy -S 658e3c7068357222a961b3107ed1c91a5ab3a893 -l 1 --dry-run ... 2024-06-11 18:15:21,726.726 INFO:teuthology.suite.run:Test results viewable at <https://pulpito.ceph.com/rishabh-2024-06-11_18:15:19-fs:functional-quincy-testing-default-smithi/>```
2024-06-11T18:24:29.643Z	<Rishabh Dave> Same happens with reef - ```$ teuthology-suite -p 50 --force-priority -m smithi -k testing -s fs:functional -c reef -S 55a56d9f0fe9010ad225380bd1eea252c51d834d -l 1 ... teuthology.exceptions.ScheduleFailError: Scheduling rishabh-2024-06-11_18:20:21-fs:functional-reef-testing-default-smithi failed: '55a56d9f0fe9010ad225380bd1eea252c51d834d' not found in repo: ceph-ci.git!```
2024-06-11T18:27:30.901Z	<Rishabh Dave> But not with Squid - ```$ teuthology-suite -p 50 --force-priority -m smithi -k testing -s fs:functional -c squid -S 3f51ea3944d39ca13aa40153c9a4ee298306679d -l 1 ... teuthology.exceptions.ScheduleFailError: Scheduling rishabh-2024-06-11_18:24:23-fs:functional-squid-testing-default-smithi failed: Packages for os_type 'centos', flavor default and ceph hash '3f51ea3944d39ca13aa40153c9a4ee298306679d' not found $ teuthology-suite -p 50 --force-priority -m smithi -k testing -s fs:functional -c squid -S 3f51ea3944d39ca13aa40153c9a4ee298306679d -l 1 -d ubuntu ... 2024-06-11 18:26:42,446.446 INFO:teuthology.suite.run:Test results viewable at <https://pulpito.ceph.com/rishabh-2024-06-11_18:26:38-fs:functional-squid-testing-default-smithi/>```
2024-06-11T19:11:35.873Z	<Zack Cerza> hm, I think it's significant that these are release branches. aren't the canonical copies of those in ceph.git as opposed to ceph-ci?
2024-06-11T19:20:48.256Z	<Dan Mick> the release branches, yes, but ceph-ci has the evolving "next" distinguished branches too (as they grow) AIUI
2024-06-11T19:21:36.672Z	<Dan Mick> commit 9aa523302d708a6b7b80d8bdd462ebc306fb50d2 (refs/remotes/ci/quincy) commit db0330b1e4e2470d52b750e251e55a522b4f7d69 (refs/remotes/ci/squid)
2024-06-11T19:21:43.864Z	<Dan Mick> (etc. ci is my remote name)
2024-06-11T19:22:55.812Z	<Zack Cerza> the hashes posted today that aren't being found in ceph-ci.git are indeed not there - on github or the mirror - so i don't think this is a sync issue
2024-06-11T19:23:40.774Z	<Dan Mick> I thought that too, because github search can't find them for some strange reason, but I can find at least one of them in my repo
2024-06-11T19:24:20.965Z	<Zack Cerza> _some_ of the release branches are sort-of up-to-date on ceph-ci, but not all. people must sync them manually occasionally
2024-06-11T19:24:35.705Z	<Zack Cerza> (same goes for `main` of course)
2024-06-11T19:24:38.789Z	<Dan Mick> $ git show 086e633da00cf25bd1c1c7d658229b6617c08335 commit 086e633da00cf25bd1c1c7d658229b6617c08335 Merge: 6ee68927878 2b813da9df5 Author: Zac Dover <[zac.dover@proton.me](mailto:zac.dover@proton.me)> Date: Thu Jun 6 06:46:00 2024 +1000 Merge pull request #57904 from zdover23/wip-doc-2024-06-06-backport-57900-to-quincy quincy: doc/start: s/intro.rst/index.rst/ Reviewed-by: Anthony D'Atri <[anthony.datri@gmail.com](mailto:anthony.datri@gmail.com)>
2024-06-11T19:25:18.982Z	<Zack Cerza> which remote is that on though? i don't see it in ceph-ci.git.
2024-06-11T19:25:50.222Z	<Dan Mick> I...am not sure how to answer that
2024-06-11T19:27:37.531Z	<Dan Mick> I guess branch -a --contains will, and indeed it seems only to be in ceph.git
2024-06-11T19:31:25.478Z	<Zack Cerza> so I think what this is really about, is that teuthology is normally used to test _pre-release_ branches on ceph.git, and the --ceph-repo flag is pointed at ceph.git when release branches need to be tested
2024-06-11T19:32:17.969Z	<Zack Cerza> ```$ teuthology-suite --help \| egrep ceph.repo --ceph-repo <ceph_repo> Query this repository for Ceph branch and SHA1```
2024-06-11T19:33:12.103Z	<Zack Cerza> the nightlies do: ```teuthology@teuthology:~$ crontab -l \| grep repo TEUTHOLOGY_SUITE_ARGS="--non-interactive --newest=100 --ceph-repo=<https://git.ceph.com/ceph.git> --suite-repo=<https://git.ceph.com/ceph.git> --machine-type smithi"```
2024-06-11T19:33:52.654Z	<Dan Mick> yeah. I swear I found one of them in ci but it may have been far down in the head
2024-06-11T19:35:29.156Z	<Zack Cerza> I didn't trust myself to keep all that perfectly straight so I used separate clones for each combination of "ceph(\|-ci).git" and "git(hub\|.ceph).com"
2024-06-11T23:10:47.426Z	<Zack Cerza> I noticed our retry mechanism for ipmi during reimaging was a bit naïve; testing a potential improvement: <https://github.com/ceph/teuthology/pull/1955>

ceph - sepia - 2024-06-11

Any issue? please create an issue here and use the infra label.