ceph - sepia - 2024-10-22

Timestamp (UTC)Message
2024-10-22T14:47:51.349Z
<John Mulligan> I'm having the darndest time trying to do an interactive rerun for some failing teuth tests. I have tried to do a bit of debugging on my own to little avail.  Are there any known issues with SSH or FOG taking extra long recently?
Mainly the jobs just time out unable to get into the locked nodes after the initial loop.
2024-10-22T14:51:48.429Z
<Patrick Donnelly> i'm not aware of any ssh/fog issues presently
2024-10-22T14:52:04.319Z
<Patrick Donnelly> what teuthology-suite command are you using?
2024-10-22T17:14:58.219Z
<John Mulligan> `teuthology --archive $PWD/a -v --lock --block --interactive-on-error reruns/2024-10-21_1.yaml`  is what I've been running.
2024-10-22T17:15:08.648Z
<John Mulligan> I can share the yaml if you need it
2024-10-22T17:16:07.128Z
<John Mulligan> I also saw the following but that was from **after** the test already failed due to timeout:
```2024-10-22 14:46:01,776.776 INFO:teuthology.run:Summary data:
failure_reason: Expected smithi082's OS to be centos 9.stream but found ubuntu 22.04
owner: phlogistonjohn@teuthology```
2024-10-22T17:17:10.965Z
<John Mulligan> there's a test failure I can not reproduce locally but occurs in the theuthology test... but I can't hunt itdown because of this
2024-10-22T17:17:54.809Z
<John Mulligan> there's a test failure I can not reproduce locally but occurs in the theuthology test... but I can't hunt itdown because of issues just getting the "interactive rerun" commands working
2024-10-22T17:21:35.978Z
<Patrick Donnelly> I've not tried to do that before so not sure.
2024-10-22T17:26:09.065Z
<Zack Cerza> @John Mulligan can you share a logfile?
2024-10-22T17:52:20.358Z
<John Mulligan> <https://paste.centos.org/view/99d5980f>
2024-10-22T17:52:35.054Z
<John Mulligan> ^ @Zack Cerza
2024-10-22T17:53:39.607Z
<John Mulligan> FWIW, Adam was also having issues yesterday. I don't know if our problems are related or totally independent
2024-10-22T18:05:05.917Z
<Zack Cerza> definitely a bit of an odd one here. what machine is this being run on? and does anything change if you remove `archive_dir` from the job config?
2024-10-22T18:31:13.735Z
<John Mulligan> the 'teuthology vm'
2024-10-22T18:32:29.040Z
<John Mulligan> If I remove the archive option it fails:
```phlogistonjohn@teuthology:~$ teuthology  -v --lock --block  --interactive-on-error  reruns/2024-10-21_1.yaml
Traceback (most recent call last):
  File "/cephfs/home/phlogistonjohn/.local/bin/teuthology", line 8, in <module>
    sys.exit(main())
  File "/cephfs/home/phlogistonjohn/teuthology/scripts/run.py", line 38, in main
    teuthology.run.main(args)
  File "/cephfs/home/phlogistonjohn/teuthology/teuthology/run.py", line 342, in main
    set_up_logging(verbose, archive)
  File "/cephfs/home/phlogistonjohn/teuthology/teuthology/run.py", line 26, in set_up_logging
    os.mkdir(archive)
PermissionError: [Errno 13] Permission denied: '/home/teuthworker/archive/phlogistonjohn-2024-10-17_11:49:38-orch:cephadm-wip-phlogistonjohn-testing-1-2024-10-16-1345-distro-default-smithi/7953420'```
2024-10-22T18:37:52.813Z
<Patrick Donnelly> well, teuthology.front intentionally doesn't let you look at teuthworker's mount of teuthology (file system)
2024-10-22T18:38:00.095Z
<Patrick Donnelly> so the EPERM makes sense
2024-10-22T18:46:34.208Z
<John Mulligan> yes, I was just trying to answer the question 🙂
2024-10-22T18:56:07.766Z
<Patrick Donnelly> oh, ya 🙂
2024-10-22T19:30:53.198Z
<Zack Cerza> whoops, `archive_path` is the field I meant, and it's in your `reruns/2024-10-21_1.yaml`
2024-10-22T19:43:59.345Z
<Zack Cerza> oof. I get a 500 when trying to view hosts in the fog web ui. I see this in the apache log:
```[Tue Oct 22 19:42:17.918974 2024] [proxy_fcgi:error] [pid 18188] [client 172.21.0.100:62669] AH01071: Got error 'PHP message: PHP Fatal error:  Allowed memory size of 536870912 bytes exhausted (tried to allocate 28672 bytes) in /var/www/html/fog/lib/fog/fogcontroller.class.php on line 260\nPHP message: PHP Fatal error:  Allowed memory size of 536870912 bytes exhausted (tried to allocate 28672 bytes) in /var/www/html/fog/commons/init.php on line 428\n'```
@Dan Mick is it safe to reboot the host?
2024-10-22T19:56:57.076Z
<Dan Mick> why tf is it using that much  memory  ffs php
2024-10-22T19:57:25.804Z
<Dan Mick> why don't we try restarting the php service first
2024-10-22T19:59:38.793Z
<Dan Mick> ...or something.  the task list shows reimages in progress, it'd be nicer not to kill them
2024-10-22T20:02:08.008Z
<Dan Mick> looks like mysqld is the memory pig.  maybe that's safeish to restart
2024-10-22T20:02:47.080Z
<John Mulligan> @Zack Cerza ok, I'll try removing that and seeing what happens
2024-10-22T20:04:28.131Z
<Dan Mick> restarted mysqld, didn't really help.  There's plenty of free memory
2024-10-22T20:04:37.830Z
<Dan Mick> php has its own internal limit that I raised some time back
2024-10-22T20:04:44.280Z
<Dan Mick> could probably raise it again
2024-10-22T20:07:54.398Z
<Kyrylo Shatskyy> mysqld? I thought teuthology was using postgresql, what is the mysqld needed for
2024-10-22T20:08:10.495Z
<Dan Mick> php_admin_value[memory_limit] = 512M in etc/php/7.2/fpm/pool.d/www.conf
2024-10-22T20:08:43.242Z
<Dan Mick> @Kyrylo Shatskyy fog
2024-10-22T20:09:56.728Z
<Kyrylo Shatskyy> oh...
2024-10-22T20:10:58.008Z
<Dan Mick> @Zack Cerza updated to 1G, reloaded php7.2-fpm service
2024-10-22T20:15:18.970Z
<Dan Mick> fwiw "list all hosts" works for me
2024-10-22T20:21:10.805Z
<John Mulligan> @Zack Cerza no dice. <https://paste.centos.org/view/a212ffa1>
2024-10-22T20:27:20.026Z
<Zack Cerza> listing all hosts worked before; viewing individuals still doesn't
2024-10-22T20:33:18.771Z
<Zack Cerza> @John Mulligan that output is identical to the paste from before
2024-10-22T21:16:13.684Z
<Zack Cerza> FOG upstream has an issue re: the memory issue here: <https://github.com/FOGProject/fogproject/issues/515>
Their "fix" is really a workaround that drops most of the history in the db so that queries don't get too big

Any issue? please create an issue here and use the infra label.