2024-10-22T14:47:51.349Z | <John Mulligan> I'm having the darndest time trying to do an interactive rerun for some failing teuth tests. I have tried to do a bit of debugging on my own to little avail. Are there any known issues with SSH or FOG taking extra long recently?
Mainly the jobs just time out unable to get into the locked nodes after the initial loop. |
2024-10-22T14:51:48.429Z | <Patrick Donnelly> i'm not aware of any ssh/fog issues presently |
2024-10-22T14:52:04.319Z | <Patrick Donnelly> what teuthology-suite command are you using? |
2024-10-22T17:14:58.219Z | <John Mulligan> `teuthology --archive $PWD/a -v --lock --block --interactive-on-error reruns/2024-10-21_1.yaml` is what I've been running. |
2024-10-22T17:15:08.648Z | <John Mulligan> I can share the yaml if you need it |
2024-10-22T17:16:07.128Z | <John Mulligan> I also saw the following but that was from **after** the test already failed due to timeout:
```2024-10-22 14:46:01,776.776 INFO:teuthology.run:Summary data:
failure_reason: Expected smithi082's OS to be centos 9.stream but found ubuntu 22.04
owner: phlogistonjohn@teuthology``` |
2024-10-22T17:17:10.965Z | <John Mulligan> there's a test failure I can not reproduce locally but occurs in the theuthology test... but I can't hunt itdown because of this |
2024-10-22T17:17:54.809Z | <John Mulligan> there's a test failure I can not reproduce locally but occurs in the theuthology test... but I can't hunt itdown because of issues just getting the "interactive rerun" commands working |
2024-10-22T17:21:35.978Z | <Patrick Donnelly> I've not tried to do that before so not sure. |
2024-10-22T17:26:09.065Z | <Zack Cerza> @John Mulligan can you share a logfile? |
2024-10-22T17:52:20.358Z | <John Mulligan> <https://paste.centos.org/view/99d5980f> |
2024-10-22T17:52:35.054Z | <John Mulligan> ^ @Zack Cerza |
2024-10-22T17:53:39.607Z | <John Mulligan> FWIW, Adam was also having issues yesterday. I don't know if our problems are related or totally independent |
2024-10-22T18:05:05.917Z | <Zack Cerza> definitely a bit of an odd one here. what machine is this being run on? and does anything change if you remove `archive_dir` from the job config? |
2024-10-22T18:31:13.735Z | <John Mulligan> the 'teuthology vm' |
2024-10-22T18:32:29.040Z | <John Mulligan> If I remove the archive option it fails:
```phlogistonjohn@teuthology:~$ teuthology -v --lock --block --interactive-on-error reruns/2024-10-21_1.yaml
Traceback (most recent call last):
File "/cephfs/home/phlogistonjohn/.local/bin/teuthology", line 8, in <module>
sys.exit(main())
File "/cephfs/home/phlogistonjohn/teuthology/scripts/run.py", line 38, in main
teuthology.run.main(args)
File "/cephfs/home/phlogistonjohn/teuthology/teuthology/run.py", line 342, in main
set_up_logging(verbose, archive)
File "/cephfs/home/phlogistonjohn/teuthology/teuthology/run.py", line 26, in set_up_logging
os.mkdir(archive)
PermissionError: [Errno 13] Permission denied: '/home/teuthworker/archive/phlogistonjohn-2024-10-17_11:49:38-orch:cephadm-wip-phlogistonjohn-testing-1-2024-10-16-1345-distro-default-smithi/7953420'``` |
2024-10-22T18:37:52.813Z | <Patrick Donnelly> well, teuthology.front intentionally doesn't let you look at teuthworker's mount of teuthology (file system) |
2024-10-22T18:38:00.095Z | <Patrick Donnelly> so the EPERM makes sense |
2024-10-22T18:46:34.208Z | <John Mulligan> yes, I was just trying to answer the question 🙂 |
2024-10-22T18:56:07.766Z | <Patrick Donnelly> oh, ya 🙂 |
2024-10-22T19:30:53.198Z | <Zack Cerza> whoops, `archive_path` is the field I meant, and it's in your `reruns/2024-10-21_1.yaml` |
2024-10-22T19:43:59.345Z | <Zack Cerza> oof. I get a 500 when trying to view hosts in the fog web ui. I see this in the apache log:
```[Tue Oct 22 19:42:17.918974 2024] [proxy_fcgi:error] [pid 18188] [client 172.21.0.100:62669] AH01071: Got error 'PHP message: PHP Fatal error: Allowed memory size of 536870912 bytes exhausted (tried to allocate 28672 bytes) in /var/www/html/fog/lib/fog/fogcontroller.class.php on line 260\nPHP message: PHP Fatal error: Allowed memory size of 536870912 bytes exhausted (tried to allocate 28672 bytes) in /var/www/html/fog/commons/init.php on line 428\n'```
@Dan Mick is it safe to reboot the host? |
2024-10-22T19:56:57.076Z | <Dan Mick> why tf is it using that much memory ffs php |
2024-10-22T19:57:25.804Z | <Dan Mick> why don't we try restarting the php service first |
2024-10-22T19:59:38.793Z | <Dan Mick> ...or something. the task list shows reimages in progress, it'd be nicer not to kill them |
2024-10-22T20:02:08.008Z | <Dan Mick> looks like mysqld is the memory pig. maybe that's safeish to restart |
2024-10-22T20:02:47.080Z | <John Mulligan> @Zack Cerza ok, I'll try removing that and seeing what happens |
2024-10-22T20:04:28.131Z | <Dan Mick> restarted mysqld, didn't really help. There's plenty of free memory |
2024-10-22T20:04:37.830Z | <Dan Mick> php has its own internal limit that I raised some time back |
2024-10-22T20:04:44.280Z | <Dan Mick> could probably raise it again |
2024-10-22T20:07:54.398Z | <Kyrylo Shatskyy> mysqld? I thought teuthology was using postgresql, what is the mysqld needed for |
2024-10-22T20:08:10.495Z | <Dan Mick> php_admin_value[memory_limit] = 512M in etc/php/7.2/fpm/pool.d/www.conf |
2024-10-22T20:08:43.242Z | <Dan Mick> @Kyrylo Shatskyy fog |
2024-10-22T20:09:56.728Z | <Kyrylo Shatskyy> oh... |
2024-10-22T20:10:58.008Z | <Dan Mick> @Zack Cerza updated to 1G, reloaded php7.2-fpm service |
2024-10-22T20:15:18.970Z | <Dan Mick> fwiw "list all hosts" works for me |
2024-10-22T20:21:10.805Z | <John Mulligan> @Zack Cerza no dice. <https://paste.centos.org/view/a212ffa1> |
2024-10-22T20:27:20.026Z | <Zack Cerza> listing all hosts worked before; viewing individuals still doesn't |
2024-10-22T20:33:18.771Z | <Zack Cerza> @John Mulligan that output is identical to the paste from before |
2024-10-22T21:16:13.684Z | <Zack Cerza> FOG upstream has an issue re: the memory issue here: <https://github.com/FOGProject/fogproject/issues/515>
Their "fix" is really a workaround that drops most of the history in the db so that queries don't get too big |