2024-10-10T00:33:15.476Z | <badone> Is it something like this? <https://tracker.ceph.com/issues/64681> |
2024-10-10T13:34:08.644Z | <Rost Khudov> Sorry for the delay, I am using reef ceph version
and I remember it was working like a month ago or so |
2024-10-10T13:35:06.845Z | <Casey Bodley> the account feature will only be available on squid and later |
2024-10-10T13:35:34.384Z | <Rost Khudov> oh okay, good to know, thank you |
2024-10-10T14:20:51.575Z | <Teoman Onay> The ceph-ansible CI fails with mds not running but when I look in the journalctl, I got this:
```Oct 10 14:06:39 mds2 ceph-mds-mds2[95698]: /home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.3.0-5244-g5e8f360e/rpm/el9/BUILD/ceph-19.3.0-5244-g5e8f360e/src/osdc/Journaler.h: 568: FAILED ceph_assert(!true)
Oct 10 14:06:39 mds2 ceph-mds-mds2[95698]:
Oct 10 14:06:39 mds2 ceph-mds-mds2[95698]: ceph version 19.3.0-5244-g5e8f360e (5e8f360e31bf28277e1a3fb999096eb2de823304) squid (dev)
Oct 10 14:06:39 mds2 ceph-mds-mds2[95698]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x12e) [0x7fdfdaf4f76a]
Oct 10 14:06:39 mds2 ceph-mds-mds2[95698]: 2: /usr/lib64/ceph/libceph-common.so.2(+0x18c928) [0x7fdfdaf4f928]
Oct 10 14:06:39 mds2 ceph-mds-mds2[95698]: 3: (MDLog::create(MDSContext*)+0x27c) [0x5579abedf01c]
Oct 10 14:06:39 mds2 ceph-mds-mds2[95698]: 4: (MDSRank::boot_create()+0x1c9) [0x5579abc08199]
Oct 10 14:06:39 mds2 ceph-mds-mds2[95698]: 5: (MDSRankDispatcher::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&, MDSMap const&)+0x2516) [0x5579abc10696]
Oct 10 14:06:39 mds2 ceph-mds-mds2[95698]: 6: (MDSDaemon::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&)+0x1026) [0x5579abbe6316]
Oct 10 14:06:39 mds2 ceph-mds-mds2[95698]: 7: (MDSDaemon::handle_core_message(boost::intrusive_ptr<Message const> const&)+0x367) [0x5579abbe7127]
Oct 10 14:06:39 mds2 ceph-mds-mds2[95698]: 8: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x162) [0x5579abbe7882]
Oct 10 14:06:39 mds2 ceph-mds-mds2[95698]: 9: (DispatchQueue::entry()+0x4c2) [0x7fdfdb1678a2]
Oct 10 14:06:39 mds2 ceph-mds-mds2[95698]: 10: /usr/lib64/ceph/libceph-common.so.2(+0x442fd1) [0x7fdfdb205fd1]
Oct 10 14:06:39 mds2 ceph-mds-mds2[95698]: 11: /lib64/libc.so.6(+0x89d22) [0x7fdfda8ddd22]
Oct 10 14:06:39 mds2 ceph-mds-mds2[95698]: 12: /lib64/libc.so.6(+0x10ed40) [0x7fdfda962d40]
Oct 10 14:06:39 mds2 ceph-mds-mds2[95698]:
Oct 10 14:06:39 mds2 ceph-mds-mds2[95698]: debug 0> 2024-10-10T14:06:39.133+0000 7fdfd54bb640 -1 *** Caught signal (Aborted) **
Oct 10 14:06:39 mds2 ceph-mds-mds2[95698]: in thread 7fdfd54bb640 thread_name:ms_dispatch
Oct 10 14:06:39 mds2 ceph-mds-mds2[95698]:
Oct 10 14:06:39 mds2 ceph-mds-mds2[95698]: ceph version 19.3.0-5244-g5e8f360e (5e8f360e31bf28277e1a3fb999096eb2de823304) squid (dev)
Oct 10 14:06:39 mds2 ceph-mds-mds2[95698]: 1: /lib64/libc.so.6(+0x3e730) [0x7fdfda892730]
Oct 10 14:06:39 mds2 ceph-mds-mds2[95698]: 2: /lib64/libc.so.6(+0x8ba6c) [0x7fdfda8dfa6c]
Oct 10 14:06:39 mds2 ceph-mds-mds2[95698]: 3: raise()
Oct 10 14:06:39 mds2 ceph-mds-mds2[95698]: 4: abort()
Oct 10 14:06:39 mds2 ceph-mds-mds2[95698]: 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x188) [0x7fdfdaf4f7c4]
Oct 10 14:06:39 mds2 ceph-mds-mds2[95698]: 6: /usr/lib64/ceph/libceph-common.so.2(+0x18c928) [0x7fdfdaf4f928]
Oct 10 14:06:39 mds2 ceph-mds-mds2[95698]: 7: (MDLog::create(MDSContext*)+0x27c) [0x5579abedf01c]
Oct 10 14:06:39 mds2 ceph-mds-mds2[95698]: 8: (MDSRank::boot_create()+0x1c9) [0x5579abc08199]
Oct 10 14:06:39 mds2 ceph-mds-mds2[95698]: 9: (MDSRankDispatcher::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&, MDSMap const&)+0x2516) [0x5579abc10696]
Oct 10 14:06:39 mds2 ceph-mds-mds2[95698]: 10: (MDSDaemon::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&)+0x1026) [0x5579abbe6316]
Oct 10 14:06:39 mds2 ceph-mds-mds2[95698]: 11: (MDSDaemon::handle_core_message(boost::intrusive_ptr<Message const> const&)+0x367) [0x5579abbe7127]
Oct 10 14:06:39 mds2 ceph-mds-mds2[95698]: 12: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x162) [0x5579abbe7882]
Oct 10 14:06:39 mds2 ceph-mds-mds2[95698]: 13: (DispatchQueue::entry()+0x4c2) [0x7fdfdb1678a2]
Oct 10 14:06:39 mds2 ceph-mds-mds2[95698]: 14: /usr/lib64/ceph/libceph-common.so.2(+0x442fd1) [0x7fdfdb205fd1]
Oct 10 14:06:39 mds2 ceph-mds-mds2[95698]: 15: /lib64/libc.so.6(+0x89d22) [0x7fdfda8ddd22]
Oct 10 14:06:39 mds2 ceph-mds-mds2[95698]: 16: /lib64/libc.so.6(+0x10ed40) [0x7fdfda962d40]
Oct 10 14:06:39 mds2 ceph-mds-mds2[95698]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.``` |
2024-10-10T14:21:03.671Z | <Teoman Onay> Is it something known? |
2024-10-10T14:21:22.197Z | <Teoman Onay> Or should I create a tracker |
2024-10-10T14:49:06.535Z | <Casey Bodley> that looks like <https://tracker.ceph.com/issues/68291> which is fixed on main |
2024-10-10T15:11:49.691Z | <Andrea Bolzonella> Hi team,
We have encountered a situation in a large Pacific cluster where the OSD map is not being trimmed by the monitor, and it continues to grow.
Upon investigation, we discovered that the OSD maps are being held by some OSDs in the "osd_epochs", but these OSDs do not exist.
Is it expected behavior to have non-existing OSDs in the "osd_epochs"?
Are you aware of any bugs that might be causing this issue?
We are considering creating and destroying OSDs with these IDs, but we are concerned about causing the monitor to crash due to an assert.
Thanks for the help |
2024-10-10T15:19:04.023Z | <Teoman Onay> Weird! I just deployed an env locally and I faced the issue. |
2024-10-10T15:39:59.456Z | <Teoman Onay> looks like the container image is quite old. |
2024-10-10T15:59:42.939Z | <gregsfortytwo> @Andrea Bolzonella sounds like there was some previous disaster recovery and OSDs got marked as lost or removed, but not fully cleaned up? you can look at the source code we only add to osd_epochs when we get a beacon, and we remove from osd_epochs when OSDs are marked out |