ceph - sepia - 2024-10-09

Timestamp (UTC)Message
2024-10-09T00:35:57.202Z
<Dan Mick> hv02 is back up, and AFAIK the VMs are back up
2024-10-09T01:46:04.803Z
<Gian-Luca Casella> I'm not sure if it's just me, but i was wondering if anyone else is having an issue doing a `docker pull [quay.ceph.io/ceph-ci/ceph:main](http://quay.ceph.io/ceph-ci/ceph:main)`
It appears `[quay.ceph.io/ceph-ci/ceph:main](http://quay.ceph.io/ceph-ci/ceph:main)` is automatically redirecting to
```<https://quay-quay-quay.apps.os.sepia.ceph.com/ceph-ci/ceph:main/>```
2024-10-09T06:12:48.406Z
<Nitzan Mordechai> @Adam Kraitman can you please check pulpito again? it looks like all machine are locked by jobs that no longer communicate
2024-10-09T07:16:02.285Z
<rzarzynski> @Adam Kraitman: `o06` is still out:

```$ ssh  [rzarzynski@o06.front.sepia.ceph.com](mailto:rzarzynski@o06.front.sepia.ceph.com)
ssh: connect to host [o06.front.sepia.ceph.com](http://o06.front.sepia.ceph.com) port 22: Connection timed out```
2024-10-09T07:27:04.914Z
<Sunil Angadi> @Adam Kraitman `[reesi004.ceph.redhat.com](http://reesi004.ceph.redhat.com)` is also down
```[root@ceph-rbd2-sangadi-ms-ebmg82-node1-installer ~]# sudo mount -t nfs -o sec=sys,nfsvers=4.1 [reesi004.ceph.redhat.com:/](http://reesi004.ceph.redhat.com:/) /ceph
mount.nfs: Connection refused```
2024-10-09T08:35:19.144Z
<Teoman Onay> when trying to download an image I have the following message:
```tonay:~$ podman pull [quay.ceph.io/ceph-ci/ceph:main](http://quay.ceph.io/ceph-ci/ceph:main)
Trying to pull [quay.ceph.io/ceph-ci/ceph:main](http://quay.ceph.io/ceph-ci/ceph:main)...
Error: parsing image configuration: Get "<https://quay-quay-quay.apps.os.sepia.ceph.com/_storage_proxy/ZXlKaGJHY2lPaUpTVXpJMU5pSXNJbXRwWkNJNklubEdXVXR1YTFCYVpYWmliWG95VG14UE5qVkJTRXQxYkRWRVZURnVjMU52VVZCdVJXbHdWa0YzVG5NaUxDSjBlWEFpT2lKS1YxUWlmUS5leUpwYzNNaU9pSnhkV0Y1SWl3aVlYVmtJam9pY1hWaGVTMXhkV0Y1TFhGMVlYa3VZWEJ3Y3k1dmN5NXpaWEJwWVM1alpYQm9MbU52YlNJc0ltNWlaaUk2TVRjeU9EUTJNamMwT0N3aWFXRjBJam94TnpJNE5EWXlOelE0TENKbGVIQWlPakUzTWpnME5qSTNOemdzSW5OMVlpSTZJbk4wYjNKaFoyVndjbTk0ZVNJc0ltRmpZMlZ6Y3lJNlczc2lkSGx3WlNJNkluTjBiM0poWjJWd2NtOTRlU0lzSW5WeWFTSTZJbkYxWVhrdlpHRjBZWE4wYjNKaFoyVXZjbVZuYVhOMGNua3ZjMmhoTWpVMkwySTNMMkkzTTJJMk1XVTVOR1E0TVRBeFpHWmtZbUk1WXpJME9XRTROREZtTW1VMk1tRmlNV1E0TWpVMll6RTJNalJrWWprMk5XTTBOVGN3TURFME1HVmlOREVfUVZkVFFXTmpaWE56UzJWNVNXUTlUamxEVUVvelZqZzNUekJZV1VKTVR6bEhORXNtVTJsbmJtRjBkWEpsUFVkTGJGWkpRa05pV2xaUU5ITWxNa0pZYURGR1VrOVRjMjV6ZVZsM0pUTkVKa1Y0Y0dseVpYTTlNVGN5T0RRMk16TTBPQ0lzSW1odmMzUWlPaUp5WldWemFUQXdNaTVtY205dWRDNXpaWEJwWVM1alpYQm9MbU52YlRvNE1DSXNJbk5qYUdWdFpTSTZJbWgwZEhBaWZWMHNJbU52Ym5SbGVIUWlPbnQ5ZlEuWDBPdlVJMWFMR2pQVGFMQk5EOW1HcEdDWDJzLWlIcTFHdG1ObzlkVGJLWjc2ZWxKekU2QmtEWWltQU03MXdsbUh1MXhyNEVxMGtHTUdzX096ZDc1YUhZV0dJcERIeWd4c3Q4VmQ2M1Y0WEppMER2aTdpNUpUUkpjRTRYUFpnQnJEMmFhZDdTLTFGWTdZaDRVWVc4T0kwUHItVEdzNFhGMFJTNXc5VEp3dE4yaGY5am1xanhZWjhLWVBHeFdVZUstY0RudURLbzNVQjhMY3EwRGMtNlNOV2F3Qng1b3drZVJRNnh6cU9JdFktUGxWUWx0cjMzNDRkMWcyS21ybnhicTZvaV9vd1F1VW0yWFlkWU5LOXlIMXlCRll3am5CMWlJZ3ByR0ZSRzNWTG1EQzl6bWFmRGpRN2RJM2tpZUtrN3BlUU55X244WlRtM2RHb3B1SkhtbmNB/http/reesi002.front.sepia.ceph.com:80/quay/datastorage/registry/sha256/b7/b73b61e94d8101dfdbb9c249a841f2e62ab1d8256c1624db965c45700140eb41?AWSAccessKeyId=N9CPJ3V87O0XYBLO9G4K&Signature=GKlVIBCbZVP4s%2BXh1FROSsnsyYw%3D&Expires=1728463348>": dial tcp: lookup [quay-quay-quay.apps.os.sepia.ceph.com](http://quay-quay-quay.apps.os.sepia.ceph.com): no such host```
I get this since yesterday morning
2024-10-09T09:43:03.613Z
<Shraddha Agrawal> Hey folks, none of the job seem to be getting picked up, the queue keeps increasing. Can we please check this?
2024-10-09T09:54:49.297Z
<Lee Sanders> Yes, image pulling from quay looks to be quite sick.
2024-10-09T09:55:16.086Z
<Vallari Agrawal> Looks like the dispatcher is down
2024-10-09T09:55:30.612Z
<Vallari Agrawal> Looks like the dispatcher is down: <https://grafana-route-grafana.apps.os.sepia.ceph.com/d/teuthology/teuthology?orgId=1&refresh=1m>
2024-10-09T10:57:55.245Z
<Aviv Caro> we have the same issue. @Adam Kraitman do you have anyway to help on it?
2024-10-09T11:27:40.447Z
<Adam Kraitman> Hey I am going over the different lab issues, I will give an update soon
2024-10-09T13:56:07.272Z
<Adam Kraitman> Hey @Gian-Luca Casella From where are you trying to docker pull, it's seems that the issue is only from users trying to pull from outside the sepia lab
2024-10-09T13:56:20.317Z
<Adam Kraitman> Hey @Gian-Luca Casella From where are you trying to docker pull? it's seems that the issue is only from users trying to pull from outside the sepia lab
2024-10-09T13:57:27.630Z
<Adam Kraitman> Hey @Nitzan Mordechai that should be solved after I  restarted the services on teuthology
2024-10-09T13:58:10.405Z
<Adam Kraitman> HEy @rzarzynski you can try now
2024-10-09T13:58:20.039Z
<Adam Kraitman> Hey @rzarzynski you can try now
2024-10-09T13:59:09.639Z
<Adam Kraitman> Please open a tracker ticket in the Octo project
2024-10-09T14:01:51.433Z
<Adam Kraitman> Hey I am seeing that the issue is only when pulling from outside the lab, so I am checking if something was changed yesterday that might have caused it
2024-10-09T14:02:57.769Z
<Adam Kraitman> Hey @Shraddha Agrawal I think it's fixed now
2024-10-09T14:04:22.539Z
<Aviv Caro> Ok
2024-10-09T14:12:34.255Z
<Shraddha Agrawal> Oh thanks a lot Adam! I see its working now ๐Ÿ™
2024-10-09T14:29:25.319Z
<Sunil Angadi> ok done <https://tracker.ceph.com/issues/68462>
plaese check.
2024-10-09T14:29:35.355Z
<Sunil Angadi> ok done <https://tracker.ceph.com/issues/68462>
please check.
2024-10-09T14:42:47.656Z
<John Mulligan> FWIW I saw in this channel that Zack altered some revese proxy settings yesterday. The intent was to prevent the reverse proxy settings from crashing the proxy. However, is it possible this change has had some unintended side-effects?
2024-10-09T14:49:21.624Z
<yuriw> the smithi queue seems paused, is it on purpose?  @Zack Cerza
2024-10-09T16:01:27.127Z
<Laura Flores> @yuriw it looks like there are lab problems in general (see above)
2024-10-09T16:04:02.322Z
<Laura Flores> Hey @Adam Kraitman how are things in the lab going? I assume some services are still expected to be down since the channel status still mentions possible DNS issues?
2024-10-09T16:10:37.281Z
<Laura Flores> @yuriw it seems like testing is back up though? Are you able to confirm/deny?
2024-10-09T16:11:46.284Z
<yuriw> tough to say, that pulpito shows running jobs
2024-10-09T17:15:30.882Z
<Adam Kraitman> The only issue I am seeing right now is with pull requests to [quay.ceph.io](http://quay.ceph.io) from outside the lab, I don't see any issues in the test environment after restarting teuthology services and unlocking stale testnodes
2024-10-09T17:15:33.665Z
<Zack Cerza> ah sorry for not responding earlier @yuriw - jobs were running by the time I looked a couple hours back, but not as many as I expected to see. I did find a pile of nodes that were locked but should not have been, so cleaning those up now. things should pick back up shortly
2024-10-09T18:03:55.111Z
<Laura Flores> Thanks!
2024-10-09T18:56:09.159Z
<Dan Mick> the first CI containers using the new CI code are showing up in [quay.ceph.io](http://quay.ceph.io).   ๐ŸŽ‰๐ŸŽ‰
2024-10-09T18:56:26.414Z
<Dan Mick> the first CI containers using the new container build code are showing up in [quay.ceph.io](http://quay.ceph.io).   ๐ŸŽ‰๐ŸŽ‰
2024-10-09T18:57:10.704Z
<Dan Mick> (and it seems like there are too many CI builds of code that has obvious C++ errors that should have been caught before push)
2024-10-09T19:15:07.316Z
<yuriw> Don't know why it's failing on c9
[https://jenkins.ceph.com/job/ceph-dev-new-build/ARCH=x86_64,AVAILABLE_ARCH=x86_64,AVA[โ€ฆ]entos9,DIST=centos9,MACHINE_SIZE=gigantic/83674//consoleFull](https://jenkins.ceph.com/job/ceph-dev-new-build/ARCH=x86_64,AVAILABLE_ARCH=x86_64,AVAILABLE_DIST=centos9,DIST=centos9,MACHINE_SIZE=gigantic/83674//consoleFull)

Anybody else is experiencing such failures:
2024-10-09T19:15:38.076Z
<yuriw> build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/dist/ceph-19.2.0-454-gadab5e4d/container
/tmp/jenkins15095396064514401342.sh: line 2024: cd: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/dist/ceph-19.2.0-454-gadab5e4d/container: No such file or directory

@Dan Mick @Laura Flores pls take a look
cc: @nehaojha
2024-10-09T19:18:55.438Z
<Laura Flores> @Adam Kraitman @Dan Mick can you take a look at this? This needs to be prioritized for fixing.
2024-10-09T19:19:52.125Z
<yuriw> (ref: <https://tracker.ceph.com/issues/68447>)
2024-10-09T20:04:14.110Z
<Dan Mick> first thought is that the container/ pr needs to be merged into that branch
2024-10-09T20:04:59.755Z
<Dan Mick> <https://github.com/ceph/ceph/pull/59868>
2024-10-09T20:05:26.068Z
<Dan Mick> that is, the branch probably needs rebasing on main
2024-10-09T20:05:46.125Z
<Dan Mick> which is something I could have thought about publicizing more clearly
2024-10-09T20:05:51.208Z
<Laura Flores> Can you share the link to whatever PR we need to make sure is included?
2024-10-09T20:05:56.687Z
<Dan Mick> ^
2024-10-09T20:06:07.199Z
<Laura Flores> Oh thx
2024-10-09T20:06:27.653Z
<Dan Mick> but let me look at the actual failure more closely
2024-10-09T20:06:58.041Z
<Laura Flores> Some are failing from compile issues that are separate. But the container issues are like what Yuri pasted above. 
2024-10-09T20:07:14.079Z
<Dan Mick> do you have a link to the actual build failure yuri
2024-10-09T20:07:28.878Z
<yuriw> <https://tracker.ceph.com/issues/68447>
2024-10-09T20:07:29.619Z
<Laura Flores> Check the original thread
2024-10-09T20:07:54.211Z
<Laura Flores> Iโ€™ll re-link it: [https://jenkins.ceph.com/job/ceph-dev-new-build/ARCH=x86_64,AVAILABLE_ARCH=x86_64,AVAILABLE_DIST=centos9,DIST=centos9,MACHINE_SIZE=gigantic/83674//consoleFull](https://jenkins.ceph.com/job/ceph-dev-new-build/ARCH=x86_64,AVAILABLE_ARCH=x86_64,AVAILABLE_DIST=centos9,DIST=centos9,MACHINE_SIZE=gigantic/83674//consoleFull)
2024-10-09T20:08:15.073Z
<Laura Flores> Thx for checking Dan
2024-10-09T20:08:48.620Z
<Dan Mick> yes, that's certainly the issue there.
2024-10-09T20:09:23.462Z
<Laura Flores> Gotcha. @yuriw it looks like you need to rebase all affected branches
2024-10-09T20:09:28.286Z
<Dan Mick> I hadn't thought about any backporting because I was thinking ceph-ci / main but maybe there is some backporting necessary, I'm fuzzy on that
2024-10-09T20:09:39.079Z
<yuriw> I will rebase then
2024-10-09T20:10:05.717Z
<Laura Flores> @yuriw at the moment this can only be fixed in main branches. So stable branches might need a backport. 
2024-10-09T20:10:24.461Z
<Laura Flores> @Dan Mick I assume youโ€™ll let us know if there are backports at play 
2024-10-09T20:10:26.678Z
<Dan Mick> if it needs backport it should be a very simple one; it's a new directory
2024-10-09T20:10:35.174Z
<yuriw> ok waiting
2024-10-09T20:10:50.509Z
<Dan Mick> I could use advice there.  Are there branches not based on main in ceph-ci.git that need building with ceph-dev-new?
2024-10-09T20:11:00.890Z
<Laura Flores> Please go forth with main-based branches @yuriw 
2024-10-09T20:11:43.618Z
<yuriw> I rebased <https://tracker.ceph.com/issues/68445>
2024-10-09T20:12:12.613Z
<Laura Flores> @Dan Mick I canโ€™t check right now but my first line of action would be to check if any stable branch-based builds have failed from the same issue in Shaman
2024-10-09T20:12:54.450Z
<Laura Flores> I can check in about 20 minutes and get back to you 
2024-10-09T20:13:57.585Z
<Dan Mick> that sounds like you're saying such branches exist, which was the question, not whether they're failing; if there are branches that don't and shouldn't merge that PR as a matter of course, but still need to build with ceph-dev-new, they will surely fail, yes.
2024-10-09T20:15:02.452Z
<Dan Mick> and yes, of course there are, all the complexity in ceph-dev-new-trigger.  sigh.  ok.
2024-10-09T20:15:09.012Z
<Laura Flores> Yeah I guess Iโ€™m not sure if those branches are somehow built with a different ci job. But yes they definitely exist 
2024-10-09T20:15:55.854Z
<Dan Mick> what's the right way to cause that to happen?  Should I create an issue so that I can set its backport fields and trigger some mechanism?
2024-10-09T20:16:14.369Z
<Dan Mick> should I tag the original PR with labels?
2024-10-09T20:18:28.071Z
<Dan Mick> <https://docs.ceph.com/en/reef/dev/developer_guide/essentials/#backporting> seems to imply the former
2024-10-09T20:18:50.012Z
<Dan Mick> just gonna grab some lunch, bbiab, sorry for the oversight but I will follow up
2024-10-09T20:22:26.972Z
<Laura Flores> Nw. To create a backport, you can follow the formal process here: <https://github.com/ceph/ceph/blob/main/SubmittingPatches-backports.rst> which would involve raising a tracker ticket, attaching the PR, and running the "backport-create-issue" script, which creates a backport tracker ticket, which you can then use when you run the "ceph-backport.sh" script to create a backport PR.

You can also simply checkout a new branch based on a stable branch, and `cherry-pick -x` the commits on that branch, then create a PR. That's all that the "ceph-backport.sh" script does.
2024-10-09T20:22:39.503Z
<Laura Flores> How to use those scripts is all documented in that link. LMK if you have questions
2024-10-09T20:23:44.721Z
<Laura Flores> These are the important sections:
<https://github.com/ceph/ceph/blob/main/SubmittingPatches-backports.rst#creating-backport-tracker-issues>
<https://github.com/ceph/ceph/blob/main/SubmittingPatches-backports.rst#opening-a-backport-pr>
2024-10-09T20:36:03.992Z
<Gian-Luca Casella> @Adam Kraitman definitely doing it from outside of the sepia lab, it appears that Ubuntu 24.04 cephadm package is pulling from the sepia lab environment.
2024-10-09T21:25:23.470Z
<Dan Mick> Created <https://tracker.ceph.com/issues/68467>
2024-10-09T21:28:15.356Z
<Dan Mick> is it better for me to go ahead and use the scripts to create the backport issue and PR, or should I rather defer to the backport team?
2024-10-09T21:39:32.314Z
<Laura Flores> There is no backport team. That should be updated/removed. cc @Zac Dover
2024-10-09T21:40:09.227Z
<Laura Flores> It should be rephrased to say, the author of the PR is responsible for creating their own backport.
2024-10-09T21:41:08.477Z
<Laura Flores> So, yes please go ahead and use the scripts
2024-10-09T21:44:50.871Z
<Laura Flores> There is no backport team. That should be updated/removed. cc @Zac Dover (I created a ticket for this: <https://tracker.ceph.com/issues/68471>)
2024-10-09T21:54:32.256Z
<Dan Mick> oh
2024-10-09T21:54:33.147Z
<Dan Mick> ok
2024-10-09T21:58:58.825Z
<Laura Flores> Hey all, for those who have Shaman builds failing from container issues, please make sure your branch is rebased with <https://github.com/ceph/ceph/pull/59868>. Backports have not yet been merged, but those are to come.

cc @Dan Mick
2024-10-09T21:59:30.451Z
<Laura Flores> Hey all, for those who have Shaman builds failing from container issues, please make sure your branch is rebased with <https://github.com/ceph/ceph/pull/59868>. Backports have not yet been merged, but those are to come.

The failure looks like this:
```build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/dist/ceph-19.2.0-454-gadab5e4d/container
/tmp/jenkins15095396064514401342.sh: line 2024: cd: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/dist/ceph-19.2.0-454-gadab5e4d/container: No such file or directory```
cc @Dan Mick
2024-10-09T22:19:41.799Z
<Dan Mick> PRs filed, tagged you for review
2024-10-09T22:21:23.452Z
<yuriw> I will rebase with <https://github.com/ceph/ceph/pull/60229>
Thx @Dan Mick!
2024-10-09T22:21:57.125Z
<Laura Flores> Approved!
2024-10-09T22:22:26.336Z
<Laura Flores> @yuriw keep in mind that none of the stable PRs have been merged yet, so simply rebasing against squid won't work. You probably already know that, but JFYI
2024-10-09T22:22:59.099Z
<yuriw> I will add it to the batch
2024-10-09T22:24:02.079Z
<Dan Mick> the checks are pointless for these PRs, but I don't know of a way to bypass them
2024-10-09T22:24:19.889Z
<Laura Flores> It's fine, we will just let them run
2024-10-09T22:24:24.564Z
<Laura Flores> Thanks Dan!
2024-10-09T22:26:28.149Z
<Dan Mick> (sent a followup email to sepia, too)
2024-10-09T23:40:37.439Z
<Samuel Just> I'm getting empty responses from pulpito log links ([http://qa-proxy.ceph.com/teuthology/sjust-2024-10-08_01:33:53-crimson-rados-wip-sjus[โ€ฆ]ting-2024-10-01-distro-default-smithi/7938702/teuthology.log](http://qa-proxy.ceph.com/teuthology/sjust-2024-10-08_01:33:53-crimson-rados-wip-sjust-crimson-testing-2024-10-01-distro-default-smithi/7938702/teuthology.log)) -- is qa-proxy healthy?

Any issue? please create an issue here and use the infra label.