ceph - ceph-devel - 2024-10-11

Timestamp (UTC)Message
2024-10-11T11:39:06.928Z
<Samaksh Dhingra> Yes
2024-10-11T13:38:38.292Z
<Casey Bodley> for dashboard/cephadm folks, i've been seeing occasional check-black failures in 'make check':
```The following tests FAILED:
	  4 - run-tox-mgr (Failed)
	 22 - run-tox-cephadm (Failed)```
details in <https://tracker.ceph.com/issues/68509>
2024-10-11T13:42:12.033Z
<Andrea Bolzonella> That probably is what happened. I'm trying to reproduce the issue and then testing if creating a new OSD with the same ID will fix it. Do you know if there is any assertion in that part of the code that might kill the MONs?
2024-10-11T13:43:44.008Z
<Andrea Bolzonella> I suspect it is happening because the owner of the cluster is obsessed with keeping the same ID for the OSD. and that case some an ID not be "up"  for a long time.
2024-10-11T13:45:26.824Z
<nizamial09> Looks like its used by cephadm. @Adam King or @John Mulligan?
2024-10-11T13:47:18.593Z
<Casey Bodley> should i move the tracker issue to the Orchestrator project?
2024-10-11T13:47:56.474Z
<John Mulligan> go ahead, but I don't think this issue is related to the code. `Aborted!` is not a normal error, I think that the tool is just being killed
2024-10-11T13:48:21.039Z
<John Mulligan> maybe memory usage on the workers (again)?
2024-10-11T13:48:39.119Z
<John Mulligan> is this happening frequently or just on this PR?
2024-10-11T13:49:37.904Z
<Casey Bodley> i've seen a handful of the same 2 failures on other prs
2024-10-11T15:39:34.943Z
<gregsfortytwo> Dunno, I just did a quick grep
2024-10-11T15:40:27.168Z
<gregsfortytwo> It does look to me like if you remove an OSD without marking it out, this would be the result. Not sure if there are any guardrails there or if something else will clear the osd_epochs data structure elsewhere
2024-10-11T16:11:37.663Z
<Kyrylo Shatskyy> the tox stuff is very strange when it comes to run make check
2024-10-11T16:12:55.581Z
<Kyrylo Shatskyy> it just uses the default python3 to prebuild wheels, and ignores PY overrides from scripts when installing system python dependencies
2024-10-11T20:42:38.023Z
<ljon> I have a situation for writting data to ceph rbd image using their librbd c++ or go-ceph library. Let us use go-ceph as example. Imagine that if  rbd.Image.Write function hangs, can I run rbd.Image.Close function? will close function hang as well?
The reason I want to run close function when write hangs is because at least, i can release the handle, so that this image can be changed(written or remove or create snap etc) by other threads.
I understand that there are various reason could lead to write() hang, for example, network connection issue, or say one of OSD is full. When answering this question, could you please pick multiple these situations as examples, and explain them. Also please provide comment on whether calling close() function is a good solution at these kind of situation(similar ones such as read() hangs, snap purge hangs etc).
Thanks for help
2024-10-11T21:09:51.026Z
<Patrick Donnelly> I'm rather blown away by Gemini's (Google's GPT) ability to create a jq command from a plain query with provided json
2024-10-11T21:11:05.508Z
<Ken Carlile> oh no, this is something that might actually get me to use AI!
2024-10-11T21:11:33.530Z
<Patrick Donnelly> I've been using it more and more, it's hit or miss but it's usually better than rolling the dice opening a stackoverflow link
2024-10-11T21:12:18.062Z
<Patrick Donnelly> i pasted a `ceph mon dump --format=json` and it replied with what it thought it was (pretty accurately); then I followed up with "use jq to select addresses with "type" equal to "v2""
2024-10-11T21:12:33.249Z
<Patrick Donnelly> and it gave me: `jq '.[] | .public_addrs.addrvec[] | select(.type == "v2") | .addr'`
2024-10-11T21:12:44.787Z
<Patrick Donnelly> which works perfectly, I'm rather stunned
2024-10-11T21:13:54.242Z
<yuriw> [https://youtu.be/PZDo-udXmgQ?si=d_6rJgoQMrayu5RU](https://youtu.be/PZDo-udXmgQ?si=d_6rJgoQMrayu5RU)

Machines will replace humans 😉 
2024-10-11T21:38:39.543Z
<ljon> I have a situation for writting data to ceph rbd image using their librbd c++ or go-ceph library. Let us use go-ceph as example. Imagine that if  rbd.Image.Write function hangs, can I run rbd.Image.Close function? will close function hang as well?
The reason I want to run close function when write hangs is because at least, i can release the handle, so that this image can be changed(written or remove or create snap etc) by other threads.

I understand that there are various reason could lead to write() hang, for example, network connection issue, or say one of OSD is full. When answering this question, could you please pick multiple these situations as examples, and explain them. Also please provide comment on whether calling close() function is a good solution at these kind of situation(similar ones such as read() hangs, snap purge hangs etc).

Are there any settings I can use to control the hang time for rbd operations such as read, write, snap related, say if I call rbd write(), I can give it a timeout time say 30sec?

Thanks for help

Any issue? please create an issue here and use the infra label.