ceph - sepia - 2024-07-23

Timestamp (UTC)Message
2024-07-23T15:22:04.281Z
<Patrick Donnelly> @Dan Mick @Zack Cerza FYI the "Locked to capture FOG image for Jenkins build ..." smithi locks are not getting unlocked it looks like
2024-07-23T15:22:18.631Z
<Patrick Donnelly> about to run: `teuthology-lock --unlock --owner jenkins-build@teuthology [smithi163.front.sepia.ceph.com](http://smithi163.front.sepia.ceph.com)` `[smithi138.front.sepia.ceph.com](http://smithi138.front.sepia.ceph.com)` `[smithi103.front.sepia.ceph.com](http://smithi103.front.sepia.ceph.com)` `[smithi076.front.sepia.ceph.com](http://smithi076.front.sepia.ceph.com)` `[smithi120.front.sepia.ceph.com](http://smithi120.front.sepia.ceph.com)` `[smithi143.front.sepia.ceph.com](http://smithi143.front.sepia.ceph.com)` `[smithi192.front.sepia.ceph.com](http://smithi192.front.sepia.ceph.com)` `[smithi176.front.sepia.ceph.com](http://smithi176.front.sepia.ceph.com)` `[smithi097.front.sepia.ceph.com](http://smithi097.front.sepia.ceph.com)` `[smithi001.front.sepia.ceph.com](http://smithi001.front.sepia.ceph.com)` `[smithi081.front.sepia.ceph.com](http://smithi081.front.sepia.ceph.com)` `[smithi003.front.sepia.ceph.com](http://smithi003.front.sepia.ceph.com)` `[smithi114.front.sepia.ceph.com](http://smithi114.front.sepia.ceph.com)` `[smithi089.front.sepia.ceph.com](http://smithi089.front.sepia.ceph.com)` `[smithi017.front.sepia.ceph.com](http://smithi017.front.sepia.ceph.com)` `[smithi144.front.sepia.ceph.com](http://smithi144.front.sepia.ceph.com)` `[smithi053.front.sepia.ceph.com](http://smithi053.front.sepia.ceph.com)` `[smithi029.front.sepia.ceph.com](http://smithi029.front.sepia.ceph.com)` `[smithi154.front.sepia.ceph.com](http://smithi154.front.sepia.ceph.com)` `[smithi169.front.sepia.ceph.com](http://smithi169.front.sepia.ceph.com)` `[smithi043.front.sepia.ceph.com](http://smithi043.front.sepia.ceph.com)` `[smithi084.front.sepia.ceph.com](http://smithi084.front.sepia.ceph.com)` `[smithi088.front.sepia.ceph.com](http://smithi088.front.sepia.ceph.com)` `[smithi087.front.sepia.ceph.com](http://smithi087.front.sepia.ceph.com)` `[smithi171.front.sepia.ceph.com](http://smithi171.front.sepia.ceph.com)`
2024-07-23T15:22:23.337Z
<Patrick Donnelly> that's quite a list...
2024-07-23T15:23:03.206Z
<Patrick Donnelly> sorry actually `teuthology-lock --unlock --owner jenkins-build@teuthology [smithi063.front.sepia.ceph.com](http://smithi063.front.sepia.ceph.com)` `[smithi187.front.sepia.ceph.com](http://smithi187.front.sepia.ceph.com)` `[smithi150.front.sepia.ceph.com](http://smithi150.front.sepia.ceph.com)` `[smithi107.front.sepia.ceph.com](http://smithi107.front.sepia.ceph.com)`
2024-07-23T15:24:33.374Z
<Patrick Donnelly> the longer list si for jobs scheduled by teuthology that have not been unlocked
2024-07-23T15:24:36.105Z
<Patrick Donnelly> i.e. nightlies
2024-07-23T15:25:35.603Z
<Patrick Donnelly> https://files.slack.com/files-pri/T1HG3J90S-F07EC9W586L/download/nodes_locked__24h
2024-07-23T15:58:50.100Z
<Patrick Donnelly> I've probably manually unlocked 2 dozen smithi nodes just now
2024-07-23T16:00:38.512Z
<Patrick Donnelly> we have two nodes locked by @Casey Bodley and @yuriw with description "reimage failed 10 times"
2024-07-23T16:00:49.238Z
<Patrick Donnelly> was that done automatically or manually?
2024-07-23T16:01:06.317Z
<Casey Bodley> i don't have anything locked manually
2024-07-23T16:01:36.770Z
<Patrick Donnelly> i wonder what part of teuthology may have done that automatically
2024-07-23T16:07:55.027Z
<Dan Mick> It's not teuth, it's a Jenkins job that I haven't had time to diagnose
2024-07-23T16:08:30.170Z
<Dan Mick> I'll try to keep a better eye on it
2024-07-23T16:13:43.731Z
<Zack Cerza> I filed <https://github.com/ceph/ceph-build/pull/2260> a couple weeks back, which would unlock nodes after the job fails. a workaround, but would save us having to clean up manually
2024-07-23T17:19:02.572Z
<Dan Mick> okay, caving
2024-07-23T17:45:23.943Z
<yuriw> I did not lock anything either
2024-07-23T17:51:56.978Z
<Laura Flores> @Patrick Donnelly do you know where the configurable is for teuthology job timeouts?
2024-07-23T17:52:03.803Z
<Laura Flores> (cc @Adam Kupczyk ^)
2024-07-23T17:57:59.634Z
<Ilya Dryomov> Hi Zack
Not related to the above and should be mostly harmless (I don't think it consumes any lab resources), but I wanted to raise jobs that stay in `waiting` state forever (e.g. `3 days, 22:14:53`) to you just in case:
<https://pulpito.ceph.com/dis-2024-07-19_19:31:37-rbd-main-distro-default-smithi/7808875>
<https://pulpito.ceph.com/dis-2024-07-19_19:30:20-rbd-main-distro-default-smithi/7808862>
<https://pulpito.ceph.com/dis-2024-07-19_19:30:20-rbd-main-distro-default-smithi/7808867>
2024-07-23T17:59:52.816Z
<Patrick Donnelly> I think it has to be set in the yaml but i've never tried to change it.
2024-07-23T18:06:22.582Z
<Laura Flores> Thanks Patrick. @Zack Cerza would you know?
2024-07-23T18:09:54.973Z
<Zack Cerza> `max_job_time`
2024-07-23T18:13:01.137Z
<Laura Flores> Thanks @Zack Cerza! Is this the area where it would need to be changed? <https://github.com/ceph/teuthology/blob/a8aed60fc62d4ff39d2216b360f018eba7518cfe/teuthology/config.py#L160>
2024-07-23T18:18:45.247Z
<Zack Cerza> defaults live there; in sepia, `~teuthworker/.teuthology.yaml` is what would control scheduled jobs' max runtime
2024-07-23T18:19:07.857Z
<Zack Cerza> currently `43200`s (12h)
2024-07-23T18:25:42.579Z
<Patrick Donnelly> maybe let's just change it to 6h across the board?
2024-07-23T18:25:52.031Z
<Patrick Donnelly> you can always override in a yaml?
2024-07-23T18:26:04.983Z
<Patrick Donnelly> (we discussed this in the CLT a while back)
2024-07-23T18:34:09.963Z
<Zack Cerza> I'm happy to make the change - we want it to be 6h for now?
2024-07-23T18:34:19.203Z
<Zack Cerza> I am finishing up work on job expiration this week
2024-07-23T19:39:56.811Z
<Eric I> my connections to [teuthology.front.sepia.ceph.com](http://teuthology.front.sepia.ceph.com) close after 1 or 2 minutes. Is anyone else experiencing that too?
2024-07-23T20:25:41.864Z
<Zack Cerza> I've had one up for a couple hours. Are you timing out?
2024-07-23T20:28:58.404Z
<Eric I> Iā€™m getting `client_loop: send disconnect: Broken pipe`. My internet connection seems stable, though....
2024-07-23T20:44:03.763Z
<Zack Cerza> VPN staying up as well?
2024-07-23T20:46:02.360Z
<Zack Cerza> nothing helpful in teuthology.front's journal that I can find
2024-07-23T22:08:03.679Z
<Laura Flores> Thanks Zack!
2024-07-23T22:18:20.053Z
<Zack Cerza> you're welcome! šŸ™‚

Any issue? please create an issue here and use the infra label.