ceph - sepia - 2024-06-27

Timestamp (UTC)Message
2024-06-27T10:44:23.488Z
<Adam Kraitman> Can anyone update the **"centos-bluestore"** builds to run on Centos9, they are still configured to run on Centos8 ([vagrant&&libvirt&¢os8](https://jenkins.ceph.com/label/vagrant&&libvirt&¢os8)) and they overloading the Jenkins queue
2024-06-27T11:02:48.237Z
<Adam Kraitman> Can anyone update the **"centos-bluestore"** builds to run on Centos9, they are still configured to run on Centos8 ([vagrant&&libvirt&¢os8](https://jenkins.ceph.com/label/vagrant&&libvirt&¢os8)) and they are overloading the Jenkins queue
2024-06-27T13:19:07.375Z
<Yuval Lifshitz> Hi team, is there a process for adding checks to jenkins?
As part of a Google Summer of Code project, @Suyash Dongre run clang-tidy against our codebase (starting from OSD and RGW) and fixing issue.
He reached a milestone of fixing (or marking as false-positive) all issues of a certain type, and we would like to take advantage of that and add this as a check to jenkins, so any new issues found of that type could be attributed to the new code checked in the PR.
Since this is experimental, we would like to start from a non-required check. In addition, we are just checking one issue that was already cleaned, and therefore should not create "noise"
2024-06-27T13:22:37.835Z
<John Mulligan> Does anyone know if the infrastructure meeting will be held today?
2024-06-27T13:43:35.275Z
<Adam Kraitman> Yeah I think I will
2024-06-27T13:43:43.806Z
<Adam Kraitman> Yeah I think it will
2024-06-27T15:25:16.599Z
<Ken Dreyer> What is the agenda for the infrastructure meeting today? <https://pad.ceph.com/p/ceph-infra-weekly>
2024-06-27T16:34:14.324Z
<John Mulligan> there's a topic about the mailnig lists (convo already started)
2024-06-27T16:41:48.549Z
<Rishabh Dave> I am trying to kill few teuthology jobs and teuthology-kill command is stuck for several minutes -

```~$ teuthology-kill -r rishabh-2024-06-27_16:02:06-fs:functional-main-testing-default-smithi; teuthology-kill -r rishabh-2024-06-27_16:02:25-fs:functional-main-testing-default-smithi
2024-06-27 16:31:06,789.789 INFO:teuthology.kill:Using machine type 'smithi' received from paddles.
2024-06-27 16:31:06,789.789 INFO:teuthology.kill:Checking Beanstalk Queue...```
And this is my second attempt. Any idea what I should do? If using sudo is the answer, I don't permissions to do that and according @Zack Cerza this is not really required.
2024-06-27T16:42:19.272Z
<Rishabh Dave> @Patrick Donnelly is there anyone else who i should be tagging for such issues?
2024-06-27T16:53:33.726Z
<Rishabh Dave> Perhaps teuthology machine is under too much load?

```$ free -h
              total        used        free      shared  buff/cache   available
Mem:           47Gi        13Gi       389Mi       5.0Mi        33Gi        33Gi
Swap:          31Gi       4.2Gi        27Gi
$ uptime
 16:51:58 up 222 days, 16:16, 62 users,  load average: 10.14, 5.35, 3.66
$ nproc
16```
2024-06-27T16:56:44.640Z
<Zack Cerza> this is a limitation of the beanstalkd queue. teuthology has to iterate over each job in the queue; during that time it is effectively paused. this typically isn't very noticeable as the queue doesn't often sit at over 2k jobs, let alone nearly 7k
@Aishwarya Mathuria has been working on a replacement for the beanstalkd queue, and I have a complementary feature planned that will further speed things up, but those won't land today of course
2024-06-27T16:57:38.106Z
<Zack Cerza> the `--preserve-queue` flag exists in case you need to kill running jobs but do not need to modify the queue.
2024-06-27T17:01:43.776Z
<Zack Cerza> you've had sudo access to kill jobs for nearly five years, fwiw.
2024-06-27T17:02:29.151Z
<Rishabh Dave> strange. just a couple of minute i saw it reported 800 jobs.: https://files.slack.com/files-pri/T1HG3J90S-F079WQ94B0E/download/image.png
2024-06-27T17:04:17.232Z
<Rishabh Dave> > you've had sudo access to kill jobs for nearly five years, fwiw.
oh, that's strange. i always get prompted for password. and i am sure u have never set it for myself.

plus i was told it was expected and i should ask someone how has sudo-access to kill hobs.
2024-06-27T17:04:55.133Z
<Zack Cerza> that page has always been misleading; apologies. the lab dashboard (pinned link for this channel) has a much more complete view.
2024-06-27T17:05:31.092Z
<Zack Cerza> looks like we need to figure out why you're being asked for your password
2024-06-27T17:08:45.883Z
<Rishabh Dave> one more thing, is JCPU reported by `w` reliable? It is `28:15` if i am correct. Second highest is `4.81s`

```bhubbard pts/162  tmux(1613739).%0 13Mar24 14:42m 28:15   0.19s /cephfs/home/bhubbard/src/teuthology/virtualenv/bin/python3.10 -c import pexpect; pexpect.run('console -M [conserver.front.sepia.ceph.com](http://conserver.front.sepia.ceph.com) -p 3109 -s smithi045', logfile=open('/home/bhubbard/working/archive2/console_logs/smithi045.log', 'wb'), timeout=No```
But I don't know if this means anything useful.
2024-06-27T17:10:01.716Z
<Zack Cerza> sorry I don't know what you're asking there
2024-06-27T17:10:21.347Z
<Zack Cerza> I was just able to, as your user, use sudo to kill a process...
2024-06-27T17:43:20.931Z
<Rishabh Dave> tried just now -

```$ sudo teuthology-kill -r rishabh-2024-06-27_16:02:25-fs:functional-main-testing-default-smithi
[sudo] password for rishabh: ```
2024-06-27T17:43:44.125Z
<Rishabh Dave> doesn't work for me; tried just now -

```$ sudo teuthology-kill -r rishabh-2024-06-27_16:02:25-fs:functional-main-testing-default-smithi
[sudo] password for rishabh: ```

Any issue? please create an issue here and use the infra label.