2024-10-09T00:35:57.202Z | <Dan Mick> hv02 is back up, and AFAIK the VMs are back up |
2024-10-09T01:46:04.803Z | <Gian-Luca Casella> I'm not sure if it's just me, but i was wondering if anyone else is having an issue doing a `docker pull [quay.ceph.io/ceph-ci/ceph:main](http://quay.ceph.io/ceph-ci/ceph:main)`
It appears `[quay.ceph.io/ceph-ci/ceph:main](http://quay.ceph.io/ceph-ci/ceph:main)` is automatically redirecting to
```<https://quay-quay-quay.apps.os.sepia.ceph.com/ceph-ci/ceph:main/>```
|
2024-10-09T06:12:48.406Z | <Nitzan Mordechai> @Adam Kraitman can you please check pulpito again? it looks like all machine are locked by jobs that no longer communicate |
2024-10-09T07:16:02.285Z | <rzarzynski> @Adam Kraitman: `o06` is still out:
```$ ssh [rzarzynski@o06.front.sepia.ceph.com](mailto:rzarzynski@o06.front.sepia.ceph.com)
ssh: connect to host [o06.front.sepia.ceph.com](http://o06.front.sepia.ceph.com) port 22: Connection timed out``` |
2024-10-09T07:27:04.914Z | <Sunil Angadi> @Adam Kraitman `[reesi004.ceph.redhat.com](http://reesi004.ceph.redhat.com)` is also down
```[root@ceph-rbd2-sangadi-ms-ebmg82-node1-installer ~]# sudo mount -t nfs -o sec=sys,nfsvers=4.1 [reesi004.ceph.redhat.com:/](http://reesi004.ceph.redhat.com:/) /ceph
mount.nfs: Connection refused``` |
2024-10-09T08:35:19.144Z | <Teoman Onay> when trying to download an image I have the following message:
```tonay:~$ podman pull [quay.ceph.io/ceph-ci/ceph:main](http://quay.ceph.io/ceph-ci/ceph:main)
Trying to pull [quay.ceph.io/ceph-ci/ceph:main](http://quay.ceph.io/ceph-ci/ceph:main)...
Error: parsing image configuration: Get "<https://quay-quay-quay.apps.os.sepia.ceph.com/_storage_proxy/ZXlKaGJHY2lPaUpTVXpJMU5pSXNJbXRwWkNJNklubEdXVXR1YTFCYVpYWmliWG95VG14UE5qVkJTRXQxYkRWRVZURnVjMU52VVZCdVJXbHdWa0YzVG5NaUxDSjBlWEFpT2lKS1YxUWlmUS5leUpwYzNNaU9pSnhkV0Y1SWl3aVlYVmtJam9pY1hWaGVTMXhkV0Y1TFhGMVlYa3VZWEJ3Y3k1dmN5NXpaWEJwWVM1alpYQm9MbU52YlNJc0ltNWlaaUk2TVRjeU9EUTJNamMwT0N3aWFXRjBJam94TnpJNE5EWXlOelE0TENKbGVIQWlPakUzTWpnME5qSTNOemdzSW5OMVlpSTZJbk4wYjNKaFoyVndjbTk0ZVNJc0ltRmpZMlZ6Y3lJNlczc2lkSGx3WlNJNkluTjBiM0poWjJWd2NtOTRlU0lzSW5WeWFTSTZJbkYxWVhrdlpHRjBZWE4wYjNKaFoyVXZjbVZuYVhOMGNua3ZjMmhoTWpVMkwySTNMMkkzTTJJMk1XVTVOR1E0TVRBeFpHWmtZbUk1WXpJME9XRTROREZtTW1VMk1tRmlNV1E0TWpVMll6RTJNalJrWWprMk5XTTBOVGN3TURFME1HVmlOREVfUVZkVFFXTmpaWE56UzJWNVNXUTlUamxEVUVvelZqZzNUekJZV1VKTVR6bEhORXNtVTJsbmJtRjBkWEpsUFVkTGJGWkpRa05pV2xaUU5ITWxNa0pZYURGR1VrOVRjMjV6ZVZsM0pUTkVKa1Y0Y0dseVpYTTlNVGN5T0RRMk16TTBPQ0lzSW1odmMzUWlPaUp5WldWemFUQXdNaTVtY205dWRDNXpaWEJwWVM1alpYQm9MbU52YlRvNE1DSXNJbk5qYUdWdFpTSTZJbWgwZEhBaWZWMHNJbU52Ym5SbGVIUWlPbnQ5ZlEuWDBPdlVJMWFMR2pQVGFMQk5EOW1HcEdDWDJzLWlIcTFHdG1ObzlkVGJLWjc2ZWxKekU2QmtEWWltQU03MXdsbUh1MXhyNEVxMGtHTUdzX096ZDc1YUhZV0dJcERIeWd4c3Q4VmQ2M1Y0WEppMER2aTdpNUpUUkpjRTRYUFpnQnJEMmFhZDdTLTFGWTdZaDRVWVc4T0kwUHItVEdzNFhGMFJTNXc5VEp3dE4yaGY5am1xanhZWjhLWVBHeFdVZUstY0RudURLbzNVQjhMY3EwRGMtNlNOV2F3Qng1b3drZVJRNnh6cU9JdFktUGxWUWx0cjMzNDRkMWcyS21ybnhicTZvaV9vd1F1VW0yWFlkWU5LOXlIMXlCRll3am5CMWlJZ3ByR0ZSRzNWTG1EQzl6bWFmRGpRN2RJM2tpZUtrN3BlUU55X244WlRtM2RHb3B1SkhtbmNB/http/reesi002.front.sepia.ceph.com:80/quay/datastorage/registry/sha256/b7/b73b61e94d8101dfdbb9c249a841f2e62ab1d8256c1624db965c45700140eb41?AWSAccessKeyId=N9CPJ3V87O0XYBLO9G4K&Signature=GKlVIBCbZVP4s%2BXh1FROSsnsyYw%3D&Expires=1728463348>": dial tcp: lookup [quay-quay-quay.apps.os.sepia.ceph.com](http://quay-quay-quay.apps.os.sepia.ceph.com): no such host```
I get this since yesterday morning |
2024-10-09T09:43:03.613Z | <Shraddha Agrawal> Hey folks, none of the job seem to be getting picked up, the queue keeps increasing. Can we please check this? |
2024-10-09T09:54:49.297Z | <Lee Sanders> Yes, image pulling from quay looks to be quite sick. |
2024-10-09T09:55:16.086Z | <Vallari Agrawal> Looks like the dispatcher is down |
2024-10-09T09:55:30.612Z | <Vallari Agrawal> Looks like the dispatcher is down: <https://grafana-route-grafana.apps.os.sepia.ceph.com/d/teuthology/teuthology?orgId=1&refresh=1m> |
2024-10-09T10:57:55.245Z | <Aviv Caro> we have the same issue. @Adam Kraitman do you have anyway to help on it? |
2024-10-09T11:27:40.447Z | <Adam Kraitman> Hey I am going over the different lab issues, I will give an update soon |
2024-10-09T13:56:07.272Z | <Adam Kraitman> Hey @Gian-Luca Casella From where are you trying to docker pull, it's seems that the issue is only from users trying to pull from outside the sepia lab |
2024-10-09T13:56:20.317Z | <Adam Kraitman> Hey @Gian-Luca Casella From where are you trying to docker pull? it's seems that the issue is only from users trying to pull from outside the sepia lab |
2024-10-09T13:57:27.630Z | <Adam Kraitman> Hey @Nitzan Mordechai that should be solved after I restarted the services on teuthology |
2024-10-09T13:58:10.405Z | <Adam Kraitman> HEy @rzarzynski you can try now |
2024-10-09T13:58:20.039Z | <Adam Kraitman> Hey @rzarzynski you can try now |
2024-10-09T13:59:09.639Z | <Adam Kraitman> Please open a tracker ticket in the Octo project |
2024-10-09T14:01:51.433Z | <Adam Kraitman> Hey I am seeing that the issue is only when pulling from outside the lab, so I am checking if something was changed yesterday that might have caused it |
2024-10-09T14:02:57.769Z | <Adam Kraitman> Hey @Shraddha Agrawal I think it's fixed now |
2024-10-09T14:04:22.539Z | <Aviv Caro> Ok |
2024-10-09T14:12:34.255Z | <Shraddha Agrawal> Oh thanks a lot Adam! I see its working now ๐ |
2024-10-09T14:29:25.319Z | <Sunil Angadi> ok done <https://tracker.ceph.com/issues/68462>
plaese check. |
2024-10-09T14:29:35.355Z | <Sunil Angadi> ok done <https://tracker.ceph.com/issues/68462>
please check. |
2024-10-09T14:42:47.656Z | <John Mulligan> FWIW I saw in this channel that Zack altered some revese proxy settings yesterday. The intent was to prevent the reverse proxy settings from crashing the proxy. However, is it possible this change has had some unintended side-effects? |
2024-10-09T14:49:21.624Z | <yuriw> the smithi queue seems paused, is it on purpose? @Zack Cerza |
2024-10-09T16:01:27.127Z | <Laura Flores> @yuriw it looks like there are lab problems in general (see above) |
2024-10-09T16:04:02.322Z | <Laura Flores> Hey @Adam Kraitman how are things in the lab going? I assume some services are still expected to be down since the channel status still mentions possible DNS issues? |
2024-10-09T16:10:37.281Z | <Laura Flores> @yuriw it seems like testing is back up though? Are you able to confirm/deny? |
2024-10-09T16:11:46.284Z | <yuriw> tough to say, that pulpito shows running jobs |
2024-10-09T17:15:30.882Z | <Adam Kraitman> The only issue I am seeing right now is with pull requests to [quay.ceph.io](http://quay.ceph.io) from outside the lab, I don't see any issues in the test environment after restarting teuthology services and unlocking stale testnodes |
2024-10-09T17:15:33.665Z | <Zack Cerza> ah sorry for not responding earlier @yuriw - jobs were running by the time I looked a couple hours back, but not as many as I expected to see. I did find a pile of nodes that were locked but should not have been, so cleaning those up now. things should pick back up shortly |
2024-10-09T18:03:55.111Z | <Laura Flores> Thanks! |
2024-10-09T18:56:09.159Z | <Dan Mick> the first CI containers using the new CI code are showing up in [quay.ceph.io](http://quay.ceph.io). ๐๐ |
2024-10-09T18:56:26.414Z | <Dan Mick> the first CI containers using the new container build code are showing up in [quay.ceph.io](http://quay.ceph.io). ๐๐ |
2024-10-09T18:57:10.704Z | <Dan Mick> (and it seems like there are too many CI builds of code that has obvious C++ errors that should have been caught before push) |
2024-10-09T19:15:07.316Z | <yuriw> Don't know why it's failing on c9
[https://jenkins.ceph.com/job/ceph-dev-new-build/ARCH=x86_64,AVAILABLE_ARCH=x86_64,AVA[โฆ]entos9,DIST=centos9,MACHINE_SIZE=gigantic/83674//consoleFull](https://jenkins.ceph.com/job/ceph-dev-new-build/ARCH=x86_64,AVAILABLE_ARCH=x86_64,AVAILABLE_DIST=centos9,DIST=centos9,MACHINE_SIZE=gigantic/83674//consoleFull)
Anybody else is experiencing such failures: |
2024-10-09T19:15:38.076Z | <yuriw> build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/dist/ceph-19.2.0-454-gadab5e4d/container
/tmp/jenkins15095396064514401342.sh: line 2024: cd: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/dist/ceph-19.2.0-454-gadab5e4d/container: No such file or directory
@Dan Mick @Laura Flores pls take a look
cc: @nehaojha |
2024-10-09T19:18:55.438Z | <Laura Flores> @Adam Kraitman @Dan Mick can you take a look at this? This needs to be prioritized for fixing. |
2024-10-09T19:19:52.125Z | <yuriw> (ref: <https://tracker.ceph.com/issues/68447>) |
2024-10-09T20:04:14.110Z | <Dan Mick> first thought is that the container/ pr needs to be merged into that branch |
2024-10-09T20:04:59.755Z | <Dan Mick> <https://github.com/ceph/ceph/pull/59868> |
2024-10-09T20:05:26.068Z | <Dan Mick> that is, the branch probably needs rebasing on main |
2024-10-09T20:05:46.125Z | <Dan Mick> which is something I could have thought about publicizing more clearly |
2024-10-09T20:05:51.208Z | <Laura Flores> Can you share the link to whatever PR we need to make sure is included? |
2024-10-09T20:05:56.687Z | <Dan Mick> ^ |
2024-10-09T20:06:07.199Z | <Laura Flores> Oh thx |
2024-10-09T20:06:27.653Z | <Dan Mick> but let me look at the actual failure more closely |
2024-10-09T20:06:58.041Z | <Laura Flores> Some are failing from compile issues that are separate. But the container issues are like what Yuri pasted above. |
2024-10-09T20:07:14.079Z | <Dan Mick> do you have a link to the actual build failure yuri |
2024-10-09T20:07:28.878Z | <yuriw> <https://tracker.ceph.com/issues/68447> |
2024-10-09T20:07:29.619Z | <Laura Flores> Check the original thread |
2024-10-09T20:07:54.211Z | <Laura Flores> Iโll re-link it: [https://jenkins.ceph.com/job/ceph-dev-new-build/ARCH=x86_64,AVAILABLE_ARCH=x86_64,AVAILABLE_DIST=centos9,DIST=centos9,MACHINE_SIZE=gigantic/83674//consoleFull](https://jenkins.ceph.com/job/ceph-dev-new-build/ARCH=x86_64,AVAILABLE_ARCH=x86_64,AVAILABLE_DIST=centos9,DIST=centos9,MACHINE_SIZE=gigantic/83674//consoleFull) |
2024-10-09T20:08:15.073Z | <Laura Flores> Thx for checking Dan |
2024-10-09T20:08:48.620Z | <Dan Mick> yes, that's certainly the issue there. |
2024-10-09T20:09:23.462Z | <Laura Flores> Gotcha. @yuriw it looks like you need to rebase all affected branches |
2024-10-09T20:09:28.286Z | <Dan Mick> I hadn't thought about any backporting because I was thinking ceph-ci / main but maybe there is some backporting necessary, I'm fuzzy on that |
2024-10-09T20:09:39.079Z | <yuriw> I will rebase then |
2024-10-09T20:10:05.717Z | <Laura Flores> @yuriw at the moment this can only be fixed in main branches. So stable branches might need a backport. |
2024-10-09T20:10:24.461Z | <Laura Flores> @Dan Mick I assume youโll let us know if there are backports at play |
2024-10-09T20:10:26.678Z | <Dan Mick> if it needs backport it should be a very simple one; it's a new directory |
2024-10-09T20:10:35.174Z | <yuriw> ok waiting |
2024-10-09T20:10:50.509Z | <Dan Mick> I could use advice there. Are there branches not based on main in ceph-ci.git that need building with ceph-dev-new? |
2024-10-09T20:11:00.890Z | <Laura Flores> Please go forth with main-based branches @yuriw |
2024-10-09T20:11:43.618Z | <yuriw> I rebased <https://tracker.ceph.com/issues/68445> |
2024-10-09T20:12:12.613Z | <Laura Flores> @Dan Mick I canโt check right now but my first line of action would be to check if any stable branch-based builds have failed from the same issue in Shaman |
2024-10-09T20:12:54.450Z | <Laura Flores> I can check in about 20 minutes and get back to you |
2024-10-09T20:13:57.585Z | <Dan Mick> that sounds like you're saying such branches exist, which was the question, not whether they're failing; if there are branches that don't and shouldn't merge that PR as a matter of course, but still need to build with ceph-dev-new, they will surely fail, yes. |
2024-10-09T20:15:02.452Z | <Dan Mick> and yes, of course there are, all the complexity in ceph-dev-new-trigger. sigh. ok. |
2024-10-09T20:15:09.012Z | <Laura Flores> Yeah I guess Iโm not sure if those branches are somehow built with a different ci job. But yes they definitely exist |
2024-10-09T20:15:55.854Z | <Dan Mick> what's the right way to cause that to happen? Should I create an issue so that I can set its backport fields and trigger some mechanism? |
2024-10-09T20:16:14.369Z | <Dan Mick> should I tag the original PR with labels? |
2024-10-09T20:18:28.071Z | <Dan Mick> <https://docs.ceph.com/en/reef/dev/developer_guide/essentials/#backporting> seems to imply the former |
2024-10-09T20:18:50.012Z | <Dan Mick> just gonna grab some lunch, bbiab, sorry for the oversight but I will follow up |
2024-10-09T20:22:26.972Z | <Laura Flores> Nw. To create a backport, you can follow the formal process here: <https://github.com/ceph/ceph/blob/main/SubmittingPatches-backports.rst> which would involve raising a tracker ticket, attaching the PR, and running the "backport-create-issue" script, which creates a backport tracker ticket, which you can then use when you run the "ceph-backport.sh" script to create a backport PR.
You can also simply checkout a new branch based on a stable branch, and `cherry-pick -x` the commits on that branch, then create a PR. That's all that the "ceph-backport.sh" script does. |
2024-10-09T20:22:39.503Z | <Laura Flores> How to use those scripts is all documented in that link. LMK if you have questions |
2024-10-09T20:23:44.721Z | <Laura Flores> These are the important sections:
<https://github.com/ceph/ceph/blob/main/SubmittingPatches-backports.rst#creating-backport-tracker-issues>
<https://github.com/ceph/ceph/blob/main/SubmittingPatches-backports.rst#opening-a-backport-pr> |
2024-10-09T20:36:03.992Z | <Gian-Luca Casella> @Adam Kraitman definitely doing it from outside of the sepia lab, it appears that Ubuntu 24.04 cephadm package is pulling from the sepia lab environment. |
2024-10-09T21:25:23.470Z | <Dan Mick> Created <https://tracker.ceph.com/issues/68467> |
2024-10-09T21:28:15.356Z | <Dan Mick> is it better for me to go ahead and use the scripts to create the backport issue and PR, or should I rather defer to the backport team? |
2024-10-09T21:39:32.314Z | <Laura Flores> There is no backport team. That should be updated/removed. cc @Zac Dover |
2024-10-09T21:40:09.227Z | <Laura Flores> It should be rephrased to say, the author of the PR is responsible for creating their own backport. |
2024-10-09T21:41:08.477Z | <Laura Flores> So, yes please go ahead and use the scripts |
2024-10-09T21:44:50.871Z | <Laura Flores> There is no backport team. That should be updated/removed. cc @Zac Dover (I created a ticket for this: <https://tracker.ceph.com/issues/68471>) |
2024-10-09T21:54:32.256Z | <Dan Mick> oh |
2024-10-09T21:54:33.147Z | <Dan Mick> ok |
2024-10-09T21:58:58.825Z | <Laura Flores> Hey all, for those who have Shaman builds failing from container issues, please make sure your branch is rebased with <https://github.com/ceph/ceph/pull/59868>. Backports have not yet been merged, but those are to come.
cc @Dan Mick |
2024-10-09T21:59:30.451Z | <Laura Flores> Hey all, for those who have Shaman builds failing from container issues, please make sure your branch is rebased with <https://github.com/ceph/ceph/pull/59868>. Backports have not yet been merged, but those are to come.
The failure looks like this:
```build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/dist/ceph-19.2.0-454-gadab5e4d/container
/tmp/jenkins15095396064514401342.sh: line 2024: cd: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/dist/ceph-19.2.0-454-gadab5e4d/container: No such file or directory```
cc @Dan Mick |
2024-10-09T22:19:41.799Z | <Dan Mick> PRs filed, tagged you for review |
2024-10-09T22:21:23.452Z | <yuriw> I will rebase with <https://github.com/ceph/ceph/pull/60229>
Thx @Dan Mick! |
2024-10-09T22:21:57.125Z | <Laura Flores> Approved! |
2024-10-09T22:22:26.336Z | <Laura Flores> @yuriw keep in mind that none of the stable PRs have been merged yet, so simply rebasing against squid won't work. You probably already know that, but JFYI |
2024-10-09T22:22:59.099Z | <yuriw> I will add it to the batch |
2024-10-09T22:24:02.079Z | <Dan Mick> the checks are pointless for these PRs, but I don't know of a way to bypass them |
2024-10-09T22:24:19.889Z | <Laura Flores> It's fine, we will just let them run |
2024-10-09T22:24:24.564Z | <Laura Flores> Thanks Dan! |
2024-10-09T22:26:28.149Z | <Dan Mick> (sent a followup email to sepia, too) |
2024-10-09T23:40:37.439Z | <Samuel Just> I'm getting empty responses from pulpito log links ([http://qa-proxy.ceph.com/teuthology/sjust-2024-10-08_01:33:53-crimson-rados-wip-sjus[โฆ]ting-2024-10-01-distro-default-smithi/7938702/teuthology.log](http://qa-proxy.ceph.com/teuthology/sjust-2024-10-08_01:33:53-crimson-rados-wip-sjust-crimson-testing-2024-10-01-distro-default-smithi/7938702/teuthology.log)) -- is qa-proxy healthy? |