ceph - sepia - 2024-12-11

Timestamp (UTC)Message
2024-12-11T03:17:36.423Z
<Sepia OpenShift> [FIRING:1] teuthology (SmithiQueuePaused metrics [teuthology.front.sepia.ceph.com:61764](http://teuthology.front.sepia.ceph.com:61764) teuthology-exporter smithi 2712 openshift-user-workload-monitoring/user-workload teuthology-exporter warning) | https:///console-openshift-console.apps.os.sepia.ceph.com/monitoring/#/alerts?receiver=%23sepia
2024-12-11T09:21:30.751Z
<Fredolin B Brone> Hi team,
I have been facing issues in building the Ceph, at first openssl version mismatch has occurred and to solve it by re-building Ceph from the beginning I cleared the existing directories and files (including openssl) unknowingly. Now, I could not connect with the vossi01 machine. Can someone help me out with this?
<https://gist.github.com/Matan-B/6ba8fd8dd88ab078f0131201bb59eb5f>
This contains the list of commands I used.
2024-12-11T14:25:11.826Z
<Adam Kupczyk> Hi guys,
I am looking for a person who has knowledge on jenkins building and testing PRs.
The problem I have is that unittests are run in parallel, and **VERY** often jenkins agent is crashing while executing "unittest_bluefs".
The tests that make problem are 2 that try to write in one go 5GB. This stretches memory to the point of OOM.
I attempted a PR that tunes down the offending tests, but it is defeating their purpose.
<https://github.com/ceph/ceph/pull/61034>

I would like to make a change to jenkins make check runner to execute some tests (for starter unittest_bluefs) standalone.
Maybe such feature is already available?
2024-12-11T14:28:26.324Z
<Casey Bodley> maybe this should run in teuthology instead of make check?
2024-12-11T14:32:51.427Z
<Adam Kupczyk> Maybe.
Can you recall criteria for something to be eligible for quick make check in jenkins?
2024-12-11T14:51:46.598Z
<Casey Bodley> i don't think we really have a criteria, which explains how we've ended up with several tests taking 45+ minutes each. i've advocated that 'make check' should only run _unit_ tests, but Sage wanted more smoke testing like against vstart clusters. as a result, now we can't afford to add nice things like static analysis. i really don't think this approach is sustainable
2024-12-11T14:53:19.359Z
<Casey Bodley> sorry for the rant :)
2024-12-11T14:53:28.761Z
<gregsfortytwo> The problem we had was that we can’t validate stuff like CLI commands without a vstart cluster, which is very awkward but spiritually very much a unit test 
2024-12-11T14:53:36.188Z
<Adam Kupczyk> I am now considering splitting unittest_bluefs into typical and special cases.
Normal invocation: you get typical cases, with GTEST_SKIP() for special cases.
Teuthology: all tests are run.
2024-12-11T14:53:49.866Z
<gregsfortytwo> I would agree a 5GG data write does not sound like a unit test to me
2024-12-11T14:54:02.816Z
<Casey Bodley> why can't cli tests run against a ceph cluster in teuthology?
2024-12-11T14:54:29.749Z
<Adam Kupczyk> > why can't cli tests run against a ceph cluster in teuthology?
Can't they?
2024-12-11T14:54:47.812Z
<gregsfortytwo> Well, I suppose they could. But why have unit tests at all when we can run all our tests in teuthology? :p
2024-12-11T14:55:10.617Z
<Casey Bodley> i think we'd need to choose a concise definition of 'unit test'
2024-12-11T14:55:42.171Z
<Casey Bodley> it can't be tested as a unit if it depends on a filesystem, ceph cluster, etc
2024-12-11T14:55:49.840Z
<gregsfortytwo> Rather than bike shedding over a precise definition, let’s just move these ones and be practical about test duration and load when it comes to adding new ones
2024-12-11T14:56:08.787Z
<Adam Kupczyk> For me the baseline is that I would want "jenkins make check" step spend 50% time compiling and 50% time make checking. And not make runner explode.
2024-12-11T14:56:26.528Z
<gregsfortytwo> It will be easier to draw a hard line when we have a system for running teuthology locally
2024-12-11T14:57:25.645Z
<Adam Kupczyk> Is it a realistic to call it "when"? I remember the dream, but I thought we dropped the effort.
2024-12-11T14:59:02.375Z
<Adam Kupczyk> For example, "ceph_test_objectstore" which clearly is a unittest is running on teuthology only.
2024-12-11T15:03:52.716Z
<Adam Kupczyk> Maybe it calls for "unittest" suite ?
2024-12-11T15:04:02.264Z
<gregsfortytwo> Igor told me he found @Zack Cerza’s work on local runs and it seemed pretty sophisticated but stopped a bit ago, and I think it’s because local container builds became the blocker (which I heard he’s working on?)
2024-12-11T15:04:48.424Z
<Casey Bodley> how does a local teuthology runner help us validate prs?
2024-12-11T15:05:25.076Z
<Casey Bodley> the dev posts results on the pr?
2024-12-11T15:06:26.590Z
<gregsfortytwo> If it’s easy to run tests locally, moving them out of make check is less annoying for anyone developing the features covered by those tests. Do you enjoy having to use teuthology as part of your “did I break the cli with this patch” work loop?
2024-12-11T15:07:04.160Z
<gregsfortytwo> The reason people like being in the unit tests is just to avoid needing to use the lab. Not needing to use the lab means less friction in moving tests out
2024-12-11T15:07:12.245Z
<gregsfortytwo> That’s all
2024-12-11T15:12:13.879Z
<Adam Kupczyk> > The reason people like being in the unit tests is just to avoid needing to use the lab.
For me the reason is mostly shorter time to fix / test.
I do not think local teuthology will be anywhere close to it.

But for developing new tests it would be awesome.
2024-12-11T15:26:01.994Z
<Kyrylo Shatskyy> By definition a unittest should not come out of a unit, using filesystem is way out of a unit.
2024-12-11T15:27:38.331Z
<Kyrylo Shatskyy> I guess we as a community should rework the concept of make check.
2024-12-11T15:28:17.727Z
<Adam Kupczyk> I am coding BlueStore. I cannot imagine not using filesystem in unittests.
2024-12-11T15:29:14.644Z
<Kyrylo Shatskyy> it is not a unittest
2024-12-11T15:29:32.629Z
<Kyrylo Shatskyy> maybe stress?
2024-12-11T15:30:20.937Z
<Kyrylo Shatskyy> 45+ minutes are not unittests, another criteria of classical unit test is speed
2024-12-11T15:31:05.593Z
<Adam Kupczyk> I have known input for which I expect specific output. I operate on a single component. It quacks like unittest.
2024-12-11T15:31:37.818Z
<Adam Kupczyk> I am not talking about ceph_test_objectstore though - calling that unittest was a stretch, I agree.
2024-12-11T15:31:50.266Z
<Kyrylo Shatskyy> maybe just because you didn’t hear about other different kind of testing 😉
2024-12-11T15:32:27.174Z
<Casey Bodley> it's fine and necessary for bluefs tests to depend on a filesystem. but the filesystem is an external depenency that a unit test would try to mock away
2024-12-11T15:32:57.882Z
<Kyrylo Shatskyy> seriously ending up with OOM, sound like a memory leak if we are talking about unit tests 🙂
2024-12-11T15:33:46.195Z
<Casey Bodley> > I guess we as a community should rework the concept of make check.
i'd love to see some discussion about what this would look like. but we'd also need to commit resources to actually doing the rework, and that's likely to be the hard part
2024-12-11T15:33:58.217Z
<Adam Kupczyk> > seriously ending up with OOM, sound like a memory leak if we are talking about unit tests
It just writing 5GB in one go. It gets amplified to 10GB for a short time.
2024-12-11T15:34:51.719Z
<Kyrylo Shatskyy> 45+ minutes is a short time?
2024-12-11T15:34:55.496Z
<Kyrylo Shatskyy> I don’t think so
2024-12-11T15:36:25.274Z
<Kyrylo Shatskyy> in reality it interferes all the testing and blocks the other tests to run, or produce unexpected errors in other tests which should fail at all
2024-12-11T15:37:19.499Z
<Kyrylo Shatskyy> btw, in container it become even worth
2024-12-11T15:37:33.636Z
<Adam Kupczyk> Short is relative concept. Bottomline is that I am testing a component that does not provide value in isolation. It has to fulfill some requirements and that is being tested. It can be called whatever, I don't care, but for now I call it unittest.
2024-12-11T15:37:36.471Z
<Kyrylo Shatskyy> btw, in container it becomes even worth
2024-12-11T15:39:00.635Z
<Kyrylo Shatskyy> short means you have no time to go for another coffee
2024-12-11T15:40:55.536Z
<Kyrylo Shatskyy> Adam we can try and start from splitting of functional tests from unit, and maybe then come to more others
2024-12-11T15:43:04.616Z
<Adam Kupczyk> Sadly, I do not have time budget for advanced solutions in this regards. I wish it had been done, plus I would welcome even more unittests to specifc parts of BlueStore, but I simply cannot afford it. And documentation.
2024-12-11T15:44:49.233Z
<Adam Kupczyk> So for now I am going least effort. Skipping big tests from make check. Creating teuthology job to invoke them as part of rados suite.
2024-12-11T15:45:50.218Z
<Kyrylo Shatskyy> as an option we can selectively run big tests after all of more like unittests?
2024-12-11T16:49:43.955Z
<Adam Kupczyk> > as an option we can selectively run big tests after all of more like unittests?
This is exactly the thing I looked for help to do.
2024-12-11T16:50:31.509Z
<Adam Kupczyk> I just realized that executing unittests on teuthology will not be that easy. They are not in install targets, will not be in images provided.
2024-12-11T16:53:29.672Z
<Casey Bodley> renaming unittest_bluefs to ceph_test_bluefs would cause it to be packaged/installed
2024-12-11T16:55:08.894Z
<Adam Kupczyk> I think I need to write something like
```install(TARGETS ceph_test_objectstore
  DESTINATION ${CMAKE_INSTALL_BINDIR})```
The problem for me is different - do I want to do it?
2024-12-11T16:57:14.728Z
<Adam Kupczyk> @Casey Bodley Are you positive that renaming is enough? It feels off to me.
2024-12-11T16:58:30.089Z
<Casey Bodley> sorry yes, you need the cmake install() directive. for rpm/deb packaging, we automatically include ceph_test_* in the package that gets installed in teuthology
2024-12-11T17:04:15.359Z
<Adam Kupczyk> Related question - do you happen to know how are tests picked up for make check?
2024-12-11T17:04:54.116Z
<Adam Kupczyk> ```add_ceph_unittest(unittest_bluefs) ```
?
2024-12-11T18:40:10.898Z
<Dan Mick> I don't know how openssl first started to be broken on vossi01, but it certainly looks like it's broken now.  What caused you to start messing with the installed versions of openssl?
2024-12-11T18:42:17.689Z
<gregsfortytwo> can one of the admins handle this key creation today? <https://tracker.ceph.com/issues/69166> thanks!
2024-12-11T18:43:58.076Z
<David Galloway> @fernando.alcocer.ocho Are you able to push this change today?
2024-12-11T18:47:51.085Z
<gregsfortytwo> Thanks David!
@Slava Dubeyko ^
2024-12-11T18:55:46.944Z
<fernando.alcocer.ocho> On it
2024-12-11T19:59:57.897Z
<David Galloway> @Adam Kraitman Can you please add monitoring for the LRC in Grafana?
2024-12-11T23:09:18.189Z
<Laura Flores> @Dan Mick can you approve the redmine account request for Camila Kulahlioglu?
2024-12-11T23:39:21.959Z
<Dan Mick> @Laura Flores yeah probably, is there a redmine ticket?
2024-12-11T23:39:36.829Z
<Dan Mick> er, sorry, nm, shorted out
2024-12-11T23:43:03.550Z
<Laura Flores> There’s no ticket, she just created an account and it needs to be approved. Not sure what that looks like though. Did you get it figured out?
2024-12-11T23:47:25.501Z
<Dan Mick> no.  I see that there's an account created on 11/19, but I don't see anything that needs to be approved
2024-12-11T23:48:14.611Z
<Laura Flores> Hmm ok maybe she needs to redo it. 
2024-12-11T23:48:42.088Z
<Dan Mick> it shows her as active
2024-12-11T23:48:53.931Z
<Dan Mick> there's a box for "send account details to user", I will select that and apply
2024-12-11T23:49:14.280Z
<Laura Flores> Ok thanks
2024-12-11T23:50:15.364Z
<Dan Mick> I see a log entry that it delivered to [rpi.edu](http://rpi.edu)
2024-12-11T23:50:36.725Z
<Dan Mick> # grep kulahe3 /var/log/mail.log
Dec 11 23:49:01 tracker-prod postfix/smtp[349820]: DB64D138850: to=<[kulahe3@rpi.edu](mailto:kulahe3@rpi.edu)>, relay=[mail.rpi.edu](http://mail.rpi.edu)[128.113.0.6]:25, delay=0.83, delays=0.11/0.01/0.27/0.44, dsn=2.0.0, status=sent (250 2.0.0 4BBNn1bk014575 Message accepted for delivery)
2024-12-11T23:51:38.362Z
<Dan Mick> my account doesn't show that "send account info" so maybe she's in some state that isn't indicated elsewise.  or maybe that's just because she's never logged in
2024-12-11T23:52:07.460Z
<Dan Mick> although yours has it.  I'll send it to you too so you can see what it sends
2024-12-11T23:53:39.414Z
<Laura Flores> Ok thanks! I appreciate it

Any issue? please create an issue here and use the infra label.