ceph - cephfs - 2024-09-18

Timestamp (UTC)Message
2024-09-18T05:15:01.304Z
<Venky Shankar> ah kk
2024-09-18T05:16:06.844Z
<Venky Shankar> by default: autoclose=300s and timeout=60s
2024-09-18T05:16:20.449Z
<Venky Shankar> what did your setting look like?
2024-09-18T05:26:38.261Z
<Adam D> currently session_autoclose=1500, session_timeout=1200
2024-09-18T05:27:50.064Z
<Venky Shankar> kk. any reason for not using the defaults?
2024-09-18T05:29:33.670Z
<Adam D> we use kdb+, which keeps a lot of files/segments during processing. We are currently using kernel 5.15, because any attempts to update higher end with capabilities problems - I'm currently trying to debug it
2024-09-18T05:31:18.415Z
<Adam D> the larger session_timeout is mainly caused by this kdb+ software
2024-09-18T07:47:27.686Z
<Igor Golikov> Hi, I am still fighing to get debug symbols to be displayed in the gdb. I am using the script debug env `ceph-debug-docker.sh` , and it pulls binaries and builds the image. But when I am in the image, the binary I need to check is stripped (in this case `/usr/bin/ceph-mds`)
Here is the info on this binary:
[root@29ed102ec572 ~]# file /bin/ceph-mds
/bin/ceph-mds: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=55337129c0b7e4adef052d4e951c28987529f0d9, for GNU/Linux 3.2.0, **stripped**
2024-09-18T07:48:18.622Z
<Igor Golikov> What is the correct way to debug core dumps? I feel that I am missing something and can't guess it...
2024-09-18T07:48:52.673Z
<Venky Shankar> oh, if the binary is stripped then does having debug info packages not show up good stack frames? (cc @Milind Changire)
2024-09-18T07:49:21.836Z
<Igor Golikov> Thread 27 (LWP 72397):
#0  0x00007fddd808679a in ?? ()
#1  0x0000562871f7dc30 in ?? ()
#2  0x0000562800000189 in ?? ()
#3  0x000056286d5eca40 in ?? ()
#4  0x0000000000000000 in ?? ()
2024-09-18T07:49:31.301Z
<Igor Golikov> yeah doesnt matter what I do this is the output
2024-09-18T07:49:56.153Z
<Milind Changire> yeah, binary should not be stripped
2024-09-18T07:50:14.212Z
<Igor Golikov> I even built the SHA that produced the crash on vossi machine and tried to load the binary (which is not stripped) and to use it with provided core dump
2024-09-18T07:50:37.582Z
<Venky Shankar> Normally, when I build a test branch using ptl-tool, I pass in `--debug-build` that allows nice stack frames to be displayed
2024-09-18T07:51:00.811Z
<Igor Golikov> what is the ptl tool? i build with ninja
2024-09-18T07:51:01.218Z
<Venky Shankar> I guess Yuri built a non-debug build.
2024-09-18T07:51:09.056Z
<Milind Changire> that might not work Igor
its better to work with the exact build RPMs
2024-09-18T07:51:54.968Z
<Venky Shankar> > what is the ptl tool? i build with ninja
nice little tool to build test build for a set of PRs. Its nicely integrated with redmine to create QA trackers for testing.
2024-09-18T07:51:57.281Z
<Igor Golikov> @Milind Changire so I dont see any way to debug the dump if the binary was stripped
2024-09-18T07:52:11.732Z
<Venky Shankar> 😞
2024-09-18T07:53:55.932Z
<Igor Golikov> well i will start reading logs...
2024-09-18T07:54:02.103Z
<Igor Golikov> thanks folks
2024-09-18T09:54:57.318Z
<Igor Golikov> Do i need to be a part of Ceph org in Github? i
2024-09-18T09:55:01.626Z
<Igor Golikov> i think i am not right no
2024-09-18T09:55:02.994Z
<Igor Golikov> now
2024-09-18T09:58:03.329Z
<Igor Golikov> or, being more specific, if i want to trigger a build for specific SHA, how can i do it ?:) should i access Jenkns directly?
2024-09-18T10:41:43.638Z
<Venky Shankar> @Igor Golikov you need to push a branch to ceph-ci to get builds
2024-09-18T10:42:23.161Z
<Venky Shankar> what's the sha? I can check if we have recent (debug) builds?
2024-09-18T10:42:34.486Z
<Venky Shankar> If not, then you have have them build for you...
2024-09-18T10:42:51.783Z
<Venky Shankar> If not, then you can have shaman's (ceph-ci) build them for you...
2024-09-18T10:49:08.648Z
<Igor Golikov> ```  sha1: 26c3fb8e197dcf7a49a54d1f4c8a7362ee35a8ea```
2024-09-18T10:49:50.842Z
<Igor Golikov> i will try first to run with latest squid-release, and if it dont crash i will try to find the fix with bisect
2024-09-18T10:50:15.243Z
<Igor Golikov> some hands on with teuthology, this is the way to learn
2024-09-18T10:51:16.949Z
<Igor Golikov> `teuthology-lock --lock-many 2 --machine-type smithi` returns that only 1 node can be locked. Can i omit smithi? and lock any type of nodes?
2024-09-18T10:58:03.809Z
<Venky Shankar> @Igor Golikov smithi's are the only ones that are running tests. There used to be other nodes, but those aren't used,,,
2024-09-18T10:59:33.113Z
<Venky Shankar> @Igor Golikov <https://shaman.ceph.com/builds/ceph/squid-release/26c3fb8e197dcf7a49a54d1f4c8a7362ee35a8ea/> -- this build is 47 days old...
2024-09-18T11:00:19.600Z
<Venky Shankar> so, yeh, try latest squid-release or maybe even one of squid branches (<https://shaman.ceph.com/builds/ceph/squid/>)
2024-09-18T11:26:14.824Z
<Igor Golikov> if the build is available, so the repos are available as well for this one??
2024-09-18T11:28:40.413Z
<Igor Golikov> apparently, there are no such repo available in Shaman, thats why I was not able to create debug docker container with squid-release and this SHA
2024-09-18T11:46:48.056Z
<Venky Shankar> The packages are located in chacras nodes. E.g.: <https://chacra.ceph.com/r/ceph/quincy/b12291d110049b2f35e32e0de30d70e9a4c060d2/centos/8/flavors/default/x86_64/>
2024-09-18T11:47:21.708Z
<Venky Shankar> Its possible that for the squid release branch (sha) the packages got garbage collected 😕
2024-09-18T13:31:53.377Z
<Venky Shankar> That was my home dir eating up space. Moved that to an lvm now. space reclaoimed!
2024-09-18T13:31:59.472Z
<Venky Shankar> thx @Patrick Donnelly
2024-09-18T14:01:29.984Z
<gregsfortytwo> Oh if this is actually a squid RC I think it’s built with release flags instead of dev/debug builds. And maybe not in the same place?
2024-09-18T15:20:15.394Z
<Patrick Donnelly> np
2024-09-18T19:02:59.622Z
<olesalscheider> I upgraded ubuntu 22.04 to 24.04 which also brought an update of ceph from reef to squid (rc). This is just a test install to play around, so it does not matter too much if data is lost.
2024-09-18T19:03:36.264Z
<olesalscheider> But since the upgrade ceph-mon is broken and fails to start with the following backtrace: dpaste.com/4QY9VTQKW.txt
2024-09-18T19:04:08.802Z
<olesalscheider> I tried rc2 from -proposed (https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/2065515) but it still fails.
2024-09-18T19:04:35.772Z
<olesalscheider> Do you have any idea what might be wrong?
2024-09-18T19:38:31.725Z
<gregsfortytwo> There was an encoding problem that slipped into a release [https://github.com/ceph/ceph/pull/53340#discussion_r1399255031](https://github.com/ceph/ceph/pull/53340#discussion_r1399255031) but I thought it was cleaned up going forward
2024-09-18T19:39:53.547Z
<gregsfortytwo> @Patrick Donnelly @Venky Shankar are we going to see a rash of reef->squid upgrade failures on that bal_rank_mask issue? We need a strategy and to communicate about it more clearly if this is an ongoing problem :/
2024-09-18T19:41:50.322Z
<olesalscheider> I found that issue. But what does this mean in practise for upgrading? Do I just need a "new enough" or "old enough" version and it will work? Or is something corrupted now which I cannot restore?
2024-09-18T19:46:04.325Z
<gregsfortytwo> I thought newer versions had code to deal with it, thus my confusion here too :)
2024-09-18T20:17:56.830Z
<Patrick Donnelly> there was never a release with that encoding bug
2024-09-18T20:18:03.286Z
<Patrick Donnelly> we caught it before it went out
2024-09-18T20:44:09.062Z
<gregsfortytwo> So it was just because of the kernel client that we had to muck with it in release versions?
2024-09-18T20:44:24.574Z
<gregsfortytwo> There’s something else causing a decode issue, then, and I’ve no idea what it would be
2024-09-18T20:54:46.818Z
<Patrick Donnelly> I see you are installing v19.1.1, perhaps you were running a dev version of reef with the encoding bug?
2024-09-18T20:54:55.212Z
<Patrick Donnelly> s/running/upgrading from/

Any issue? please create an issue here and use the infra label.