2024-09-27T05:36:31.798Z | <cz tan> hi, I encountered this error while compiling on the CentOS 8 system on ARM, my lttng is lttng-ust-devel-2.8.1-11.el8.aarch64. Does anyone know the reason?thanks
BUILD/ceph-17.2.7/src/librbd/librbd.cc:6072:3: error: 'STAP_PROBEV' was not declared in this scope
6072 | tracepoint(librbd, discard_exit, r); |
2024-09-27T08:11:07.485Z | <Lucian Petrut> There's a PR that broke MDS. The Windows CI caught the problem (it runs the libcephfs tests, the "make check" doesn't), however it was forcefully merged. |
2024-09-27T08:11:39.378Z | <Lucian Petrut> There's a PR that broke MDS. The Windows CI caught the problem (it runs the libcephfs tests, the "make check" doesn't), however it was forcefully merged.
<https://github.com/ceph/ceph/pull/58936>
<https://jenkins.ceph.com/job/ceph-windows-pull-requests/47415/artifact/artifacts/cluster/ceph_logs/mds.a.log>
```
```2024-09-25T22:52:15.995+0000 7f5443bc7640 -1 /home/ubuntu/ceph/src/osdc/Journaler.h: In function 'bool Journaler::is_readonly() const' thread 7f5443bc7640 time 2024-09-25T22:52:15.993313+0000
/home/ubuntu/ceph/src/osdc/Journaler.h: 568: FAILED ceph_assert(!true)
ceph version 351d92 (c351d92b0d9db66780dbf0781c81428652c3eec7) squid (dev)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x15d) [0x7f5449e62b72]
2: /home/ubuntu/ceph/build/lib/libceph-common.so.2(+0x2bada1) [0x7f5449e62da1]
3: (MDLog::create(MDSContext*)+0x266) [0x55d6fdcf0346]
4: (MDSRank::boot_create()+0x1bb) [0x55d6fd8ff1bb]
5: (MDSRankDispatcher::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&, MDSMap const&)+0x280b) [0x55d6fd905bfb]
6: (MDSDaemon::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&)+0xe8b) [0x55d6fd8cf25b]
7: (MDSDaemon::handle_core_message(boost::intrusive_ptr<Message const> const&)+0x371) [0x55d6fd8d2fb1]
8: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0xfb) [0x55d6fd8d373b]
9: (DispatchQueue::entry()+0x629) [0x7f544a1844f9]
10: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f544a278e61]
11: (Thread::entry_wrapper()+0x54) [0x7f5449f88b94]
12: /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f54496a1ac3]
13: /lib/x86_64-linux-gnu/libc.so.6(+0x126850) [0x7f5449733850]
2024-09-25T22:52:15.999+0000 7f5443bc7640 -1 *** Caught signal (Aborted) **
in thread 7f5443bc7640 thread_name:ms_dispatch```
|
2024-09-27T08:12:11.998Z | <Lucian Petrut> There's a PR that broke MDS. The Windows CI caught the problem (it runs the libcephfs tests, the "make check" doesn't), however it was forcefully merged.
<https://github.com/ceph/ceph/pull/58936>
<https://jenkins.ceph.com/job/ceph-windows-pull-requests/47415/artifact/artifacts/cluster/ceph_logs/mds.a.log>
```2024-09-25T22:52:15.995+0000 7f5443bc7640 -1 /home/ubuntu/ceph/src/osdc/Journaler.h: In function 'bool Journaler::is_readonly() const' thread 7f5443bc7640 time 2024-09-25T22:52:15.993313+0000
/home/ubuntu/ceph/src/osdc/Journaler.h: 568: FAILED ceph_assert(!true)
ceph version 351d92 (c351d92b0d9db66780dbf0781c81428652c3eec7) squid (dev)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x15d) [0x7f5449e62b72]
2: /home/ubuntu/ceph/build/lib/libceph-common.so.2(+0x2bada1) [0x7f5449e62da1]
3: (MDLog::create(MDSContext*)+0x266) [0x55d6fdcf0346]
4: (MDSRank::boot_create()+0x1bb) [0x55d6fd8ff1bb]
5: (MDSRankDispatcher::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&, MDSMap const&)+0x280b) [0x55d6fd905bfb]
6: (MDSDaemon::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&)+0xe8b) [0x55d6fd8cf25b]
7: (MDSDaemon::handle_core_message(boost::intrusive_ptr<Message const> const&)+0x371) [0x55d6fd8d2fb1]
8: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0xfb) [0x55d6fd8d373b]
9: (DispatchQueue::entry()+0x629) [0x7f544a1844f9]
10: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f544a278e61]
11: (Thread::entry_wrapper()+0x54) [0x7f5449f88b94]
12: /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f54496a1ac3]
13: /lib/x86_64-linux-gnu/libc.so.6(+0x126850) [0x7f5449733850]
2024-09-25T22:52:15.999+0000 7f5443bc7640 -1 *** Caught signal (Aborted) **
in thread 7f5443bc7640 thread_name:ms_dispatch``` |
2024-09-27T08:12:58.127Z | <Lucian Petrut> There's a PR that broke MDS. The Windows CI caught the problem (it runs the libcephfs tests, the "make check" job doesn't), however it was forcefully merged.
<https://github.com/ceph/ceph/pull/58936>
<https://jenkins.ceph.com/job/ceph-windows-pull-requests/47415/artifact/artifacts/cluster/ceph_logs/mds.a.log>
```2024-09-25T22:52:15.995+0000 7f5443bc7640 -1 /home/ubuntu/ceph/src/osdc/Journaler.h: In function 'bool Journaler::is_readonly() const' thread 7f5443bc7640 time 2024-09-25T22:52:15.993313+0000
/home/ubuntu/ceph/src/osdc/Journaler.h: 568: FAILED ceph_assert(!true)
ceph version 351d92 (c351d92b0d9db66780dbf0781c81428652c3eec7) squid (dev)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x15d) [0x7f5449e62b72]
2: /home/ubuntu/ceph/build/lib/libceph-common.so.2(+0x2bada1) [0x7f5449e62da1]
3: (MDLog::create(MDSContext*)+0x266) [0x55d6fdcf0346]
4: (MDSRank::boot_create()+0x1bb) [0x55d6fd8ff1bb]
5: (MDSRankDispatcher::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&, MDSMap const&)+0x280b) [0x55d6fd905bfb]
6: (MDSDaemon::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&)+0xe8b) [0x55d6fd8cf25b]
7: (MDSDaemon::handle_core_message(boost::intrusive_ptr<Message const> const&)+0x371) [0x55d6fd8d2fb1]
8: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0xfb) [0x55d6fd8d373b]
9: (DispatchQueue::entry()+0x629) [0x7f544a1844f9]
10: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f544a278e61]
11: (Thread::entry_wrapper()+0x54) [0x7f5449f88b94]
12: /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f54496a1ac3]
13: /lib/x86_64-linux-gnu/libc.so.6(+0x126850) [0x7f5449733850]
2024-09-25T22:52:15.999+0000 7f5443bc7640 -1 *** Caught signal (Aborted) **
in thread 7f5443bc7640 thread_name:ms_dispatch``` |
2024-09-27T08:13:50.816Z | <Lucian Petrut> I tried a clean vstart cluster (Linux only), the MDS services crash immediately after attempting to mount cephfs |
2024-09-27T08:14:05.014Z | <Lucian Petrut> I tried a clean vstart cluster (Linux only), the MDS services crash immediately after attempting to do a cephfs mount |
2024-09-27T09:57:21.210Z | <Anoop C S> Our integration CI runs failed today morning while waiting for mds(and I think it crashed). I can check if logs contain the above assert. |
2024-09-27T11:13:17.912Z | <Lucian Petrut> I've submitted a PR that reverts these changes, unblocking the CI: <https://github.com/ceph/ceph/pull/60024> |
2024-09-27T11:27:57.036Z | <Anoop C S> Ok, at least the backtrace is same:
```Core was generated by `/usr/bin/ceph-mds -n mds.sit_fs.storage0.pishls -f --setuser ceph --setgroup ce'.
Program terminated with signal SIGABRT, Aborted.
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
44 return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0;
[Current thread is 1 (Thread 0x7ff2f9712640 (LWP 16))]
(gdb) bt
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1 0x00007ff2feb36ad3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#2 0x00007ff2feae9686 in __GI_raise (sig=6) at ../sysdeps/posix/raise.c:26
#3 0x00005571c706c57a in reraise_fatal (signum=6) at /usr/src/debug/ceph-19.3.0-5244.g5e8f360e.el9.x86_64/src/global/signal_handler.cc:88
#4 handle_oneshot_fatal_signal (signum=6) at /usr/src/debug/ceph-19.3.0-5244.g5e8f360e.el9.x86_64/src/global/signal_handler.cc:367
#5 <signal handler called>
#6 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#7 0x00007ff2feb36ad3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#8 0x00007ff2feae9686 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#9 0x00007ff2fead3833 in __GI_abort () at abort.c:79
#10 0x00007ff2ff1a67c4 in ceph::__ceph_assert_fail(char const*, char const*, int, char const*) () from /usr/lib64/ceph/libceph-common.so.2
#11 0x00007ff2ff1a6928 in ceph::__ceph_assert_fail(ceph::assert_data const&) () from /usr/lib64/ceph/libceph-common.so.2
#12 0x00005571c700b01c in Journaler::is_readonly (this=<optimized out>, this=<optimized out>) at /usr/src/debug/ceph-19.3.0-5244.g5e8f360e.el9.x86_64/src/osdc/Journaler.h:568
#13 MDLog::create (this=0x5571c9d92000, c=<optimized out>) at /usr/src/debug/ceph-19.3.0-5244.g5e8f360e.el9.x86_64/src/mds/MDLog.cc:244
#14 0x00005571c6d34199 in MDSRank::boot_create (this=0x5571c8f9d208) at /usr/src/debug/ceph-19.3.0-5244.g5e8f360e.el9.x86_64/src/mds/MDSRank.cc:2161
#15 0x00005571c6d3c696 in MDSRankDispatcher::handle_mds_map (this=0x5571c8f9d200, m=..., oldmap=...) at /usr/src/debug/ceph-19.3.0-5244.g5e8f360e.el9.x86_64/src/mds/MDSRank.cc:2409
#16 0x00005571c6d12316 in MDSDaemon::handle_mds_map (this=<optimized out>, m=...) at /usr/src/debug/ceph-19.3.0-5244.g5e8f360e.el9.x86_64/src/mds/MDSDaemon.cc:862
#17 0x00005571c6d13127 in MDSDaemon::handle_core_message (this=this@entry=0x5571c9c12a00, m=...) at /usr/src/debug/ceph-19.3.0-5244.g5e8f360e.el9.x86_64/src/common/RefCountedObj.h:56
#18 0x00005571c6d13882 in MDSDaemon::ms_dispatch2 (this=0x5571c9c12a00, m=...) at /usr/src/debug/ceph-19.3.0-5244.g5e8f360e.el9.x86_64/src/common/RefCountedObj.h:56
#19 0x00007ff2ff3be8a2 in DispatchQueue::entry() () from /usr/lib64/ceph/libceph-common.so.2
#20 0x00007ff2ff45cfd1 in DispatchQueue::DispatchThread::entry() () from /usr/lib64/ceph/libceph-common.so.2
#21 0x00007ff2feb34d22 in start_thread (arg=<optimized out>) at pthread_create.c:443
#22 0x00007ff2febb9d40 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81``` |
2024-09-27T18:33:37.929Z | <Casey Bodley> <https://github.com/ceph/ceph/pull/60026> merged to fix this |