ceph - cephfs - 2024-09-19

Timestamp (UTC)Message
2024-09-19T07:14:17.726Z
<Venky Shankar> RFR: <https://github.com/ceph/ceph/pull/59874> (a bit urgent since openstack manila workflow is affected)
2024-09-19T07:38:44.167Z
<Igor Golikov> Hi can you share some basic templace for the teutology conf file? I am trying to pick fragments of the configuration from teuthology.log. but it seems there is a lot of information related to that specific run. The manual suggests having very few conf options:
<https://docs.ceph.com/projects/teuthology/en/latest/detailed_test_config.html#test-configuration>
2024-09-19T07:39:32.318Z
<Igor Golikov> I understand that I should replace the roles and the targets with the info from the log file and machines I have locked for this test, respectively
2024-09-19T08:20:57.279Z
<olesalscheider> Patrick Donnelly: I had the official debs from ceph's ppa for reef installed on Ubuntu 22.04. This should not have been any dev version.
2024-09-19T08:21:47.633Z
<olesalscheider> Then I upgraded to ubuntu 24.04, which removed the ppa and installed ubuntu's packaged version of ceph. They are currently shipping a snapshot of squid from git, and rc2 is currently in "proposed" stage.
2024-09-19T08:22:51.372Z
<olesalscheider> It was of course supid from me not to check that the ubuntu update meant upgrading to a non-release version of ceph, but... I somehow expect I might not be the only one running into that situation.
2024-09-19T08:25:56.779Z
<olesalscheider> I think the current version of the package on ubuntu noble (not in -proposed) might contain the bug, judging by their bug tracker. But rc2 should be fine?
2024-09-19T08:26:30.780Z
<olesalscheider> Can things be corrupted if I went from reef (stable release) -> broken dev version, did not start -> squid rc2 (now also does not start)?
2024-09-19T11:24:36.530Z
<Igor Golikov> does it usually take long time to start teuthology run? I have 12 jobs queued from this morning, and nothing happens there
2024-09-19T11:27:07.025Z
<jcollin> probably less priority.
2024-09-19T11:46:59.350Z
<Igor Golikov> i didnt set it explicitly... what is the recommended setting?
2024-09-19T11:54:09.025Z
<Venky Shankar> • @Igor Golikov `priority: 1000` - it might take weeks to even get scheduled
2024-09-19T11:54:18.808Z
<Venky Shankar> for ~12 jobs you can use prio 50
2024-09-19T12:05:40.317Z
<Igor Golikov> oh... i will kill them
2024-09-19T12:05:41.567Z
<Igor Golikov> thanks
2024-09-19T12:53:53.433Z
<Venky Shankar> I'm swamped with BZs, need some more time before getting to this. sorry!
2024-09-19T12:58:31.577Z
<Venky Shankar> (let's catch up tomorrow on this -- sent a cal invite for tomorrow)
2024-09-19T12:58:45.827Z
<Venky Shankar> I will go through the tracker updates by the time we meet tomorrow.
2024-09-19T13:59:08.309Z
<Patrick Donnelly> a broken dev version could have introduced the bad encoding, perhaps
2024-09-19T13:59:22.451Z
<Patrick Donnelly> we really need to know what versions of Ceph the cluster was upgraded from / to
2024-09-19T18:07:29.565Z
<olesalscheider> The upgrade path was from 18.2.4  -> 19.xx (git snapshot from 1st of march 2024, commit 4c76c50) -> 19.2.0-rc2
2024-09-19T18:08:21.278Z
<olesalscheider> It broke when going from 18.2.4 to the snapshot at 4c76c50 (as expected), but now it also does not work with 19.2.0-rc2
2024-09-19T18:14:39.372Z
<Patrick Donnelly> ya 4c76c50 broke it
2024-09-19T18:20:44.803Z
<Patrick Donnelly> If you must rescue this cluster, it would involve using ceph-monstore-tool but I have never had occasion to use it
2024-09-19T18:21:06.078Z
<Patrick Donnelly> probably you'd need to nuke the most recent fsmaps in the mon's dbs
2024-09-19T18:59:42.451Z
<olesalscheider> I would try it. If it does not work it is not a desaster, but I think it might be helpful to have a working solution for whoever else runs into that problem.
2024-09-19T19:28:54.421Z
<olesalscheider> Can you give me a pointer as how to remove the most recent fsmap with ceph-monstore-tool? There is no option that sounds completely obvious to me
2024-09-19T20:36:54.713Z
<Patrick Donnelly> well I was trying to test it ona dev cluster but it threw exceptions when I tried to use it
2024-09-19T20:37:02.654Z
<Patrick Donnelly> so I don't know unfortunately
2024-09-19T20:37:13.751Z
<Patrick Donnelly> as I said, I've never had a reason to use it until your situation came up : /

Any issue? please create an issue here and use the infra label.