2024-09-19T07:14:17.726Z | <Venky Shankar> RFR: <https://github.com/ceph/ceph/pull/59874> (a bit urgent since openstack manila workflow is affected) |
2024-09-19T07:38:44.167Z | <Igor Golikov> Hi can you share some basic templace for the teutology conf file? I am trying to pick fragments of the configuration from teuthology.log. but it seems there is a lot of information related to that specific run. The manual suggests having very few conf options:
<https://docs.ceph.com/projects/teuthology/en/latest/detailed_test_config.html#test-configuration> |
2024-09-19T07:39:32.318Z | <Igor Golikov> I understand that I should replace the roles and the targets with the info from the log file and machines I have locked for this test, respectively |
2024-09-19T08:20:57.279Z | <olesalscheider> Patrick Donnelly: I had the official debs from ceph's ppa for reef installed on Ubuntu 22.04. This should not have been any dev version. |
2024-09-19T08:21:47.633Z | <olesalscheider> Then I upgraded to ubuntu 24.04, which removed the ppa and installed ubuntu's packaged version of ceph. They are currently shipping a snapshot of squid from git, and rc2 is currently in "proposed" stage. |
2024-09-19T08:22:51.372Z | <olesalscheider> It was of course supid from me not to check that the ubuntu update meant upgrading to a non-release version of ceph, but... I somehow expect I might not be the only one running into that situation. |
2024-09-19T08:25:56.779Z | <olesalscheider> I think the current version of the package on ubuntu noble (not in -proposed) might contain the bug, judging by their bug tracker. But rc2 should be fine? |
2024-09-19T08:26:30.780Z | <olesalscheider> Can things be corrupted if I went from reef (stable release) -> broken dev version, did not start -> squid rc2 (now also does not start)? |
2024-09-19T11:24:36.530Z | <Igor Golikov> does it usually take long time to start teuthology run? I have 12 jobs queued from this morning, and nothing happens there |
2024-09-19T11:27:07.025Z | <jcollin> probably less priority. |
2024-09-19T11:46:59.350Z | <Igor Golikov> i didnt set it explicitly... what is the recommended setting? |
2024-09-19T11:54:09.025Z | <Venky Shankar> • @Igor Golikov `priority: 1000` - it might take weeks to even get scheduled |
2024-09-19T11:54:18.808Z | <Venky Shankar> for ~12 jobs you can use prio 50 |
2024-09-19T12:05:40.317Z | <Igor Golikov> oh... i will kill them |
2024-09-19T12:05:41.567Z | <Igor Golikov> thanks |
2024-09-19T12:53:53.433Z | <Venky Shankar> I'm swamped with BZs, need some more time before getting to this. sorry! |
2024-09-19T12:58:31.577Z | <Venky Shankar> (let's catch up tomorrow on this -- sent a cal invite for tomorrow) |
2024-09-19T12:58:45.827Z | <Venky Shankar> I will go through the tracker updates by the time we meet tomorrow. |
2024-09-19T13:59:08.309Z | <Patrick Donnelly> a broken dev version could have introduced the bad encoding, perhaps |
2024-09-19T13:59:22.451Z | <Patrick Donnelly> we really need to know what versions of Ceph the cluster was upgraded from / to |
2024-09-19T18:07:29.565Z | <olesalscheider> The upgrade path was from 18.2.4 -> 19.xx (git snapshot from 1st of march 2024, commit 4c76c50) -> 19.2.0-rc2 |
2024-09-19T18:08:21.278Z | <olesalscheider> It broke when going from 18.2.4 to the snapshot at 4c76c50 (as expected), but now it also does not work with 19.2.0-rc2 |
2024-09-19T18:14:39.372Z | <Patrick Donnelly> ya 4c76c50 broke it |
2024-09-19T18:20:44.803Z | <Patrick Donnelly> If you must rescue this cluster, it would involve using ceph-monstore-tool but I have never had occasion to use it |
2024-09-19T18:21:06.078Z | <Patrick Donnelly> probably you'd need to nuke the most recent fsmaps in the mon's dbs |
2024-09-19T18:59:42.451Z | <olesalscheider> I would try it. If it does not work it is not a desaster, but I think it might be helpful to have a working solution for whoever else runs into that problem. |
2024-09-19T19:28:54.421Z | <olesalscheider> Can you give me a pointer as how to remove the most recent fsmap with ceph-monstore-tool? There is no option that sounds completely obvious to me |
2024-09-19T20:36:54.713Z | <Patrick Donnelly> well I was trying to test it ona dev cluster but it threw exceptions when I tried to use it |
2024-09-19T20:37:02.654Z | <Patrick Donnelly> so I don't know unfortunately |
2024-09-19T20:37:13.751Z | <Patrick Donnelly> as I said, I've never had a reason to use it until your situation came up : / |