2024-07-24T07:15:52.308Z | <Ilya Dryomov> @Zack Cerza I have some additional data for the
> Re: recent uptick of test jobs that get stuck and cleaned up only after 12h timeout
thread. In my krbd run, I have at least 7 jobs that got stuck at or around
```2024-07-24T01:56:04.054 INFO:teuthology.run_tasks:Running task internal.base...
2024-07-24T01:56:04.061 INFO:teuthology.task.internal:Creating test directory...
2024-07-24T01:56:04.062 DEBUG:teuthology.orchestra.run.smithi003:> mkdir -p -m0755 -- /home/ubuntu/cephtest
2024-07-24T01:56:04.067 DEBUG:teuthology.orchestra.run.smithi016:> mkdir -p -m0755 -- /home/ubuntu/cephtest
2024-07-24T01:56:04.075 DEBUG:teuthology.orchestra.run.smithi089:> mkdir -p -m0755 -- /home/ubuntu/cephtest```
shortly after rebooting into the testing kernel. Remembering the
> Something must have flushed when the job reached the 12 hour mark, likely as part of it getting killed
point, I killed one of the them and checked `teuthology.log`. This showed up after the kill:
```2024-07-24T06:58:41.243 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_teuthology_a8aed60fc62d4ff39d2216b360f018eba7518cfe/teuthology/run_tasks.py", line 109, in run_tasks
manager.__enter__()
File "/usr/lib/python3.10/contextlib.py", line 135, in __enter__
return next(self.gen)
File "/home/teuthworker/src/git.ceph.com_teuthology_a8aed60fc62d4ff39d2216b360f018eba7518cfe/teuthology/task/internal/__init__.py", line 41, in base
run.wait(
File "/home/teuthworker/src/git.ceph.com_teuthology_a8aed60fc62d4ff39d2216b360f018eba7518cfe/teuthology/orchestra/run.py", line 479, in wait
proc.wait()
File "/home/teuthworker/src/git.ceph.com_teuthology_a8aed60fc62d4ff39d2216b360f018eba7518cfe/teuthology/orchestra/run.py", line 143, in wait
status = self._get_exitstatus()
File "/home/teuthworker/src/git.ceph.com_teuthology_a8aed60fc62d4ff39d2216b360f018eba7518cfe/teuthology/orchestra/run.py", line 192, in _get_exitstatus
status = self._stdout_buf.channel.recv_exit_status()
File "/home/teuthworker/src/git.ceph.com_teuthology_a8aed60fc62d4ff39d2216b360f018eba7518cfe/virtualenv/lib/python3.10/site-packages/paramiko/channel.py", line 400, in recv_exit_status
self.status_event.wait()
File "src/gevent/event.py", line 163, in gevent._gevent_cevent.Event.wait
File "src/gevent/_abstract_linkable.py", line 521, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait
File "src/gevent/_abstract_linkable.py", line 487, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait_core
File "src/gevent/_abstract_linkable.py", line 490, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait_core
File "src/gevent/_abstract_linkable.py", line 442, in gevent._gevent_c_abstract_linkable.AbstractLinkable._AbstractLinkable__wait_to_be_notified
File "src/gevent/_abstract_linkable.py", line 451, in gevent._gevent_c_abstract_linkable.AbstractLinkable._switch_to_hub
File "src/gevent/_greenlet_primitives.py", line 61, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
File "src/gevent/_greenlet_primitives.py", line 65, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
File "src/gevent/_gevent_c_greenlet_primitives.pxd", line 35, in gevent._gevent_c_greenlet_primitives._greenlet_switch
gevent.exceptions.LoopExit: This operation would block forever
Hub: <Hub '' at 0x7f9e40f9be70 epoll default pending=0 ref=0 fileno=4 thread_ident=0x7f9e435ce740>
Handles:
[]
2024-07-24T06:58:41.530 ERROR:teuthology.util.sentry: Sentry event: <https://sentry.ceph.com/organizations/ceph/?query=839c45d450f941faa45b97f3bdacccea>
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_teuthology_a8aed60fc62d4ff39d2216b360f018eba7518cfe/teuthology/run_tasks.py", line 109, in run_tasks
manager.__enter__()
File "/usr/lib/python3.10/contextlib.py", line 135, in __enter__
return next(self.gen)
File "/home/teuthworker/src/git.ceph.com_teuthology_a8aed60fc62d4ff39d2216b360f018eba7518cfe/teuthology/task/internal/__init__.py", line 41, in base
run.wait(
File "/home/teuthworker/src/git.ceph.com_teuthology_a8aed60fc62d4ff39d2216b360f018eba7518cfe/teuthology/orchestra/run.py", line 479, in wait
proc.wait()
File "/home/teuthworker/src/git.ceph.com_teuthology_a8aed60fc62d4ff39d2216b360f018eba7518cfe/teuthology/orchestra/run.py", line 143, in wait
status = self._get_exitstatus()
File "/home/teuthworker/src/git.ceph.com_teuthology_a8aed60fc62d4ff39d2216b360f018eba7518cfe/teuthology/orchestra/run.py", line 192, in _get_exitstatus
status = self._stdout_buf.channel.recv_exit_status()
File "/home/teuthworker/src/git.ceph.com_teuthology_a8aed60fc62d4ff39d2216b360f018eba7518cfe/virtualenv/lib/python3.10/site-packages/paramiko/channel.py", line 400, in recv_exit_status
self.status_event.wait()
File "src/gevent/event.py", line 163, in gevent._gevent_cevent.Event.wait
File "src/gevent/_abstract_linkable.py", line 521, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait
File "src/gevent/_abstract_linkable.py", line 487, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait_core
File "src/gevent/_abstract_linkable.py", line 490, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait_core
File "src/gevent/_abstract_linkable.py", line 442, in gevent._gevent_c_abstract_linkable.AbstractLinkable._AbstractLinkable__wait_to_be_notified
File "src/gevent/_abstract_linkable.py", line 451, in gevent._gevent_c_abstract_linkable.AbstractLinkable._switch_to_hub
File "src/gevent/_greenlet_primitives.py", line 61, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
File "src/gevent/_greenlet_primitives.py", line 65, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
File "src/gevent/_gevent_c_greenlet_primitives.pxd", line 35, in gevent._gevent_c_greenlet_primitives._greenlet_switch
gevent.exceptions.LoopExit: This operation would block forever
Hub: <Hub '' at 0x7f9e40f9be70 epoll default pending=0 ref=1 fileno=4 thread_ident=0x7f9e435ce740>
Handles:
[]
2024-07-24T06:58:41.533 DEBUG:teuthology.run_tasks:Unwinding manager internal.base
2024-07-24T06:58:41.540 DEBUG:teuthology.run_tasks:Unwinding manager kernel
2024-07-24T06:58:41.555 DEBUG:teuthology.run_tasks:Unwinding manager console_log```
<https://qa-proxy.ceph.com/teuthology/dis-2024-07-24_01:37:18-krbd-main-wip-exclusive-option-states-default-smithi/7815575/teuthology.log>
Have you seen this before? |