2024-07-23T15:22:04.281Z | <Patrick Donnelly> @Dan Mick @Zack Cerza FYI the "Locked to capture FOG image for Jenkins build ..." smithi locks are not getting unlocked it looks like |
2024-07-23T15:22:18.631Z | <Patrick Donnelly> about to run: `teuthology-lock --unlock --owner jenkins-build@teuthology [smithi163.front.sepia.ceph.com](http://smithi163.front.sepia.ceph.com)` `[smithi138.front.sepia.ceph.com](http://smithi138.front.sepia.ceph.com)` `[smithi103.front.sepia.ceph.com](http://smithi103.front.sepia.ceph.com)` `[smithi076.front.sepia.ceph.com](http://smithi076.front.sepia.ceph.com)` `[smithi120.front.sepia.ceph.com](http://smithi120.front.sepia.ceph.com)` `[smithi143.front.sepia.ceph.com](http://smithi143.front.sepia.ceph.com)` `[smithi192.front.sepia.ceph.com](http://smithi192.front.sepia.ceph.com)` `[smithi176.front.sepia.ceph.com](http://smithi176.front.sepia.ceph.com)` `[smithi097.front.sepia.ceph.com](http://smithi097.front.sepia.ceph.com)` `[smithi001.front.sepia.ceph.com](http://smithi001.front.sepia.ceph.com)` `[smithi081.front.sepia.ceph.com](http://smithi081.front.sepia.ceph.com)` `[smithi003.front.sepia.ceph.com](http://smithi003.front.sepia.ceph.com)` `[smithi114.front.sepia.ceph.com](http://smithi114.front.sepia.ceph.com)` `[smithi089.front.sepia.ceph.com](http://smithi089.front.sepia.ceph.com)` `[smithi017.front.sepia.ceph.com](http://smithi017.front.sepia.ceph.com)` `[smithi144.front.sepia.ceph.com](http://smithi144.front.sepia.ceph.com)` `[smithi053.front.sepia.ceph.com](http://smithi053.front.sepia.ceph.com)` `[smithi029.front.sepia.ceph.com](http://smithi029.front.sepia.ceph.com)` `[smithi154.front.sepia.ceph.com](http://smithi154.front.sepia.ceph.com)` `[smithi169.front.sepia.ceph.com](http://smithi169.front.sepia.ceph.com)` `[smithi043.front.sepia.ceph.com](http://smithi043.front.sepia.ceph.com)` `[smithi084.front.sepia.ceph.com](http://smithi084.front.sepia.ceph.com)` `[smithi088.front.sepia.ceph.com](http://smithi088.front.sepia.ceph.com)` `[smithi087.front.sepia.ceph.com](http://smithi087.front.sepia.ceph.com)` `[smithi171.front.sepia.ceph.com](http://smithi171.front.sepia.ceph.com)` |
2024-07-23T15:22:23.337Z | <Patrick Donnelly> that's quite a list... |
2024-07-23T15:23:03.206Z | <Patrick Donnelly> sorry actually `teuthology-lock --unlock --owner jenkins-build@teuthology [smithi063.front.sepia.ceph.com](http://smithi063.front.sepia.ceph.com)` `[smithi187.front.sepia.ceph.com](http://smithi187.front.sepia.ceph.com)` `[smithi150.front.sepia.ceph.com](http://smithi150.front.sepia.ceph.com)` `[smithi107.front.sepia.ceph.com](http://smithi107.front.sepia.ceph.com)` |
2024-07-23T15:24:33.374Z | <Patrick Donnelly> the longer list si for jobs scheduled by teuthology that have not been unlocked |
2024-07-23T15:24:36.105Z | <Patrick Donnelly> i.e. nightlies |
2024-07-23T15:25:35.603Z | <Patrick Donnelly> https://files.slack.com/files-pri/T1HG3J90S-F07EC9W586L/download/nodes_locked__24h |
2024-07-23T15:58:50.100Z | <Patrick Donnelly> I've probably manually unlocked 2 dozen smithi nodes just now |
2024-07-23T16:00:38.512Z | <Patrick Donnelly> we have two nodes locked by @Casey Bodley and @yuriw with description "reimage failed 10 times" |
2024-07-23T16:00:49.238Z | <Patrick Donnelly> was that done automatically or manually? |
2024-07-23T16:01:06.317Z | <Casey Bodley> i don't have anything locked manually |
2024-07-23T16:01:36.770Z | <Patrick Donnelly> i wonder what part of teuthology may have done that automatically |
2024-07-23T16:07:55.027Z | <Dan Mick> It's not teuth, it's a Jenkins job that I haven't had time to diagnose |
2024-07-23T16:08:30.170Z | <Dan Mick> I'll try to keep a better eye on it |
2024-07-23T16:13:43.731Z | <Zack Cerza> I filed <https://github.com/ceph/ceph-build/pull/2260> a couple weeks back, which would unlock nodes after the job fails. a workaround, but would save us having to clean up manually |
2024-07-23T17:19:02.572Z | <Dan Mick> okay, caving |
2024-07-23T17:45:23.943Z | <yuriw> I did not lock anything either |
2024-07-23T17:51:56.978Z | <Laura Flores> @Patrick Donnelly do you know where the configurable is for teuthology job timeouts? |
2024-07-23T17:52:03.803Z | <Laura Flores> (cc @Adam Kupczyk ^) |
2024-07-23T17:57:59.634Z | <Ilya Dryomov> Hi Zack
Not related to the above and should be mostly harmless (I don't think it consumes any lab resources), but I wanted to raise jobs that stay in `waiting` state forever (e.g. `3 days, 22:14:53`) to you just in case:
<https://pulpito.ceph.com/dis-2024-07-19_19:31:37-rbd-main-distro-default-smithi/7808875>
<https://pulpito.ceph.com/dis-2024-07-19_19:30:20-rbd-main-distro-default-smithi/7808862>
<https://pulpito.ceph.com/dis-2024-07-19_19:30:20-rbd-main-distro-default-smithi/7808867> |
2024-07-23T17:59:52.816Z | <Patrick Donnelly> I think it has to be set in the yaml but i've never tried to change it. |
2024-07-23T18:06:22.582Z | <Laura Flores> Thanks Patrick. @Zack Cerza would you know? |
2024-07-23T18:09:54.973Z | <Zack Cerza> `max_job_time` |
2024-07-23T18:13:01.137Z | <Laura Flores> Thanks @Zack Cerza! Is this the area where it would need to be changed? <https://github.com/ceph/teuthology/blob/a8aed60fc62d4ff39d2216b360f018eba7518cfe/teuthology/config.py#L160> |
2024-07-23T18:18:45.247Z | <Zack Cerza> defaults live there; in sepia, `~teuthworker/.teuthology.yaml` is what would control scheduled jobs' max runtime |
2024-07-23T18:19:07.857Z | <Zack Cerza> currently `43200`s (12h) |
2024-07-23T18:25:42.579Z | <Patrick Donnelly> maybe let's just change it to 6h across the board? |
2024-07-23T18:25:52.031Z | <Patrick Donnelly> you can always override in a yaml? |
2024-07-23T18:26:04.983Z | <Patrick Donnelly> (we discussed this in the CLT a while back) |
2024-07-23T18:34:09.963Z | <Zack Cerza> I'm happy to make the change - we want it to be 6h for now? |
2024-07-23T18:34:19.203Z | <Zack Cerza> I am finishing up work on job expiration this week |
2024-07-23T19:39:56.811Z | <Eric I> my connections to [teuthology.front.sepia.ceph.com](http://teuthology.front.sepia.ceph.com) close after 1 or 2 minutes. Is anyone else experiencing that too? |
2024-07-23T20:25:41.864Z | <Zack Cerza> I've had one up for a couple hours. Are you timing out? |
2024-07-23T20:28:58.404Z | <Eric I> Iām getting `client_loop: send disconnect: Broken pipe`. My internet connection seems stable, though.... |
2024-07-23T20:44:03.763Z | <Zack Cerza> VPN staying up as well? |
2024-07-23T20:46:02.360Z | <Zack Cerza> nothing helpful in teuthology.front's journal that I can find |
2024-07-23T22:08:03.679Z | <Laura Flores> Thanks Zack! |
2024-07-23T22:18:20.053Z | <Zack Cerza> you're welcome! š |