ceph - ceph-devel - 2024-09-27

Timestamp (UTC)Message
2024-09-27T05:36:31.798Z
<cz tan> hi, I encountered this error while compiling on the CentOS 8 system on ARM, my  lttng is lttng-ust-devel-2.8.1-11.el8.aarch64. Does anyone know the reason?thanks
BUILD/ceph-17.2.7/src/librbd/librbd.cc:6072:3: error: 'STAP_PROBEV' was not declared in this scope
 6072 |   tracepoint(librbd, discard_exit, r);
2024-09-27T08:11:07.485Z
<Lucian Petrut> There's a PR that broke MDS. The Windows CI caught the problem (it runs the libcephfs tests, the "make check" doesn't), however it was forcefully merged.
2024-09-27T08:11:39.378Z
<Lucian Petrut> There's a PR that broke MDS. The Windows CI caught the problem (it runs the libcephfs tests, the "make check" doesn't), however it was forcefully merged.

<https://github.com/ceph/ceph/pull/58936>

<https://jenkins.ceph.com/job/ceph-windows-pull-requests/47415/artifact/artifacts/cluster/ceph_logs/mds.a.log>

```
```2024-09-25T22:52:15.995+0000 7f5443bc7640 -1 /home/ubuntu/ceph/src/osdc/Journaler.h: In function 'bool Journaler::is_readonly() const' thread 7f5443bc7640 time 2024-09-25T22:52:15.993313+0000
/home/ubuntu/ceph/src/osdc/Journaler.h: 568: FAILED ceph_assert(!true)

 ceph version 351d92 (c351d92b0d9db66780dbf0781c81428652c3eec7) squid (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x15d) [0x7f5449e62b72]
 2: /home/ubuntu/ceph/build/lib/libceph-common.so.2(+0x2bada1) [0x7f5449e62da1]
 3: (MDLog::create(MDSContext*)+0x266) [0x55d6fdcf0346]
 4: (MDSRank::boot_create()+0x1bb) [0x55d6fd8ff1bb]
 5: (MDSRankDispatcher::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&, MDSMap const&)+0x280b) [0x55d6fd905bfb]
 6: (MDSDaemon::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&)+0xe8b) [0x55d6fd8cf25b]
 7: (MDSDaemon::handle_core_message(boost::intrusive_ptr<Message const> const&)+0x371) [0x55d6fd8d2fb1]
 8: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0xfb) [0x55d6fd8d373b]
 9: (DispatchQueue::entry()+0x629) [0x7f544a1844f9]
 10: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f544a278e61]
 11: (Thread::entry_wrapper()+0x54) [0x7f5449f88b94]
 12: /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f54496a1ac3]
 13: /lib/x86_64-linux-gnu/libc.so.6(+0x126850) [0x7f5449733850]

2024-09-25T22:52:15.999+0000 7f5443bc7640 -1 *** Caught signal (Aborted) **
 in thread 7f5443bc7640 thread_name:ms_dispatch```
2024-09-27T08:12:11.998Z
<Lucian Petrut> There's a PR that broke MDS. The Windows CI caught the problem (it runs the libcephfs tests, the "make check" doesn't), however it was forcefully merged.

<https://github.com/ceph/ceph/pull/58936>

<https://jenkins.ceph.com/job/ceph-windows-pull-requests/47415/artifact/artifacts/cluster/ceph_logs/mds.a.log>

```2024-09-25T22:52:15.995+0000 7f5443bc7640 -1 /home/ubuntu/ceph/src/osdc/Journaler.h: In function 'bool Journaler::is_readonly() const' thread 7f5443bc7640 time 2024-09-25T22:52:15.993313+0000
/home/ubuntu/ceph/src/osdc/Journaler.h: 568: FAILED ceph_assert(!true)

 ceph version 351d92 (c351d92b0d9db66780dbf0781c81428652c3eec7) squid (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x15d) [0x7f5449e62b72]
 2: /home/ubuntu/ceph/build/lib/libceph-common.so.2(+0x2bada1) [0x7f5449e62da1]
 3: (MDLog::create(MDSContext*)+0x266) [0x55d6fdcf0346]
 4: (MDSRank::boot_create()+0x1bb) [0x55d6fd8ff1bb]
 5: (MDSRankDispatcher::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&, MDSMap const&)+0x280b) [0x55d6fd905bfb]
 6: (MDSDaemon::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&)+0xe8b) [0x55d6fd8cf25b]
 7: (MDSDaemon::handle_core_message(boost::intrusive_ptr<Message const> const&)+0x371) [0x55d6fd8d2fb1]
 8: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0xfb) [0x55d6fd8d373b]
 9: (DispatchQueue::entry()+0x629) [0x7f544a1844f9]
 10: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f544a278e61]
 11: (Thread::entry_wrapper()+0x54) [0x7f5449f88b94]
 12: /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f54496a1ac3]
 13: /lib/x86_64-linux-gnu/libc.so.6(+0x126850) [0x7f5449733850]

2024-09-25T22:52:15.999+0000 7f5443bc7640 -1 *** Caught signal (Aborted) **
 in thread 7f5443bc7640 thread_name:ms_dispatch```
2024-09-27T08:12:58.127Z
<Lucian Petrut> There's a PR that broke MDS. The Windows CI caught the problem (it runs the libcephfs tests, the "make check" job doesn't), however it was forcefully merged.

<https://github.com/ceph/ceph/pull/58936>

<https://jenkins.ceph.com/job/ceph-windows-pull-requests/47415/artifact/artifacts/cluster/ceph_logs/mds.a.log>

```2024-09-25T22:52:15.995+0000 7f5443bc7640 -1 /home/ubuntu/ceph/src/osdc/Journaler.h: In function 'bool Journaler::is_readonly() const' thread 7f5443bc7640 time 2024-09-25T22:52:15.993313+0000
/home/ubuntu/ceph/src/osdc/Journaler.h: 568: FAILED ceph_assert(!true)

 ceph version 351d92 (c351d92b0d9db66780dbf0781c81428652c3eec7) squid (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x15d) [0x7f5449e62b72]
 2: /home/ubuntu/ceph/build/lib/libceph-common.so.2(+0x2bada1) [0x7f5449e62da1]
 3: (MDLog::create(MDSContext*)+0x266) [0x55d6fdcf0346]
 4: (MDSRank::boot_create()+0x1bb) [0x55d6fd8ff1bb]
 5: (MDSRankDispatcher::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&, MDSMap const&)+0x280b) [0x55d6fd905bfb]
 6: (MDSDaemon::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&)+0xe8b) [0x55d6fd8cf25b]
 7: (MDSDaemon::handle_core_message(boost::intrusive_ptr<Message const> const&)+0x371) [0x55d6fd8d2fb1]
 8: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0xfb) [0x55d6fd8d373b]
 9: (DispatchQueue::entry()+0x629) [0x7f544a1844f9]
 10: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f544a278e61]
 11: (Thread::entry_wrapper()+0x54) [0x7f5449f88b94]
 12: /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f54496a1ac3]
 13: /lib/x86_64-linux-gnu/libc.so.6(+0x126850) [0x7f5449733850]

2024-09-25T22:52:15.999+0000 7f5443bc7640 -1 *** Caught signal (Aborted) **
 in thread 7f5443bc7640 thread_name:ms_dispatch```
2024-09-27T08:13:50.816Z
<Lucian Petrut> I tried a clean vstart cluster (Linux only), the MDS services crash immediately after attempting to mount cephfs
2024-09-27T08:14:05.014Z
<Lucian Petrut> I tried a clean vstart cluster (Linux only), the MDS services crash immediately after attempting to do a cephfs mount
2024-09-27T09:57:21.210Z
<Anoop C S> Our integration CI runs failed today morning while waiting for mds(and I think it crashed). I can check if logs contain the above assert.
2024-09-27T11:13:17.912Z
<Lucian Petrut> I've submitted a PR that reverts these changes, unblocking the CI: <https://github.com/ceph/ceph/pull/60024>
2024-09-27T11:27:57.036Z
<Anoop C S> Ok, at least the backtrace is same:
```Core was generated by `/usr/bin/ceph-mds -n mds.sit_fs.storage0.pishls -f --setuser ceph --setgroup ce'.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
44            return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0;                                                                                                        
[Current thread is 1 (Thread 0x7ff2f9712640 (LWP 16))]

(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007ff2feb36ad3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#2  0x00007ff2feae9686 in __GI_raise (sig=6) at ../sysdeps/posix/raise.c:26
#3  0x00005571c706c57a in reraise_fatal (signum=6) at /usr/src/debug/ceph-19.3.0-5244.g5e8f360e.el9.x86_64/src/global/signal_handler.cc:88
#4  handle_oneshot_fatal_signal (signum=6) at /usr/src/debug/ceph-19.3.0-5244.g5e8f360e.el9.x86_64/src/global/signal_handler.cc:367
#5  <signal handler called>
#6  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#7  0x00007ff2feb36ad3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#8  0x00007ff2feae9686 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#9  0x00007ff2fead3833 in __GI_abort () at abort.c:79
#10 0x00007ff2ff1a67c4 in ceph::__ceph_assert_fail(char const*, char const*, int, char const*) () from /usr/lib64/ceph/libceph-common.so.2
#11 0x00007ff2ff1a6928 in ceph::__ceph_assert_fail(ceph::assert_data const&) () from /usr/lib64/ceph/libceph-common.so.2
#12 0x00005571c700b01c in Journaler::is_readonly (this=<optimized out>, this=<optimized out>) at /usr/src/debug/ceph-19.3.0-5244.g5e8f360e.el9.x86_64/src/osdc/Journaler.h:568
#13 MDLog::create (this=0x5571c9d92000, c=<optimized out>) at /usr/src/debug/ceph-19.3.0-5244.g5e8f360e.el9.x86_64/src/mds/MDLog.cc:244
#14 0x00005571c6d34199 in MDSRank::boot_create (this=0x5571c8f9d208) at /usr/src/debug/ceph-19.3.0-5244.g5e8f360e.el9.x86_64/src/mds/MDSRank.cc:2161
#15 0x00005571c6d3c696 in MDSRankDispatcher::handle_mds_map (this=0x5571c8f9d200, m=..., oldmap=...) at /usr/src/debug/ceph-19.3.0-5244.g5e8f360e.el9.x86_64/src/mds/MDSRank.cc:2409
#16 0x00005571c6d12316 in MDSDaemon::handle_mds_map (this=<optimized out>, m=...) at /usr/src/debug/ceph-19.3.0-5244.g5e8f360e.el9.x86_64/src/mds/MDSDaemon.cc:862
#17 0x00005571c6d13127 in MDSDaemon::handle_core_message (this=this@entry=0x5571c9c12a00, m=...) at /usr/src/debug/ceph-19.3.0-5244.g5e8f360e.el9.x86_64/src/common/RefCountedObj.h:56
#18 0x00005571c6d13882 in MDSDaemon::ms_dispatch2 (this=0x5571c9c12a00, m=...) at /usr/src/debug/ceph-19.3.0-5244.g5e8f360e.el9.x86_64/src/common/RefCountedObj.h:56
#19 0x00007ff2ff3be8a2 in DispatchQueue::entry() () from /usr/lib64/ceph/libceph-common.so.2
#20 0x00007ff2ff45cfd1 in DispatchQueue::DispatchThread::entry() () from /usr/lib64/ceph/libceph-common.so.2
#21 0x00007ff2feb34d22 in start_thread (arg=<optimized out>) at pthread_create.c:443
#22 0x00007ff2febb9d40 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81```
2024-09-27T18:33:37.929Z
<Casey Bodley> <https://github.com/ceph/ceph/pull/60026> merged to fix this

Any issue? please create an issue here and use the infra label.