2024-07-22T11:47:05.649Z | <sur5r> Hi all! One of my OSDs fails to start after a reboot. Gist is: "bluestore::NCB::__restore_allocator::No Valid allocation info on disk (empty file)" -> "_init_alloc::NCB::restore_allocator() failed! Run Full Recovery from ONodes (might take a while) ..." -> "HybridAllocator init_rm_free lambda Uexpected extent: 0x3b000~10000" -> "./src/os/bluestore/HybridAllocator.cc: 178: FAILED ceph_assert(false)" |
2024-07-22T11:47:18.417Z | <sur5r> Does that ring any bells? My search came up empty so far. |
2024-07-22T11:49:28.413Z | <sur5r> Other OSDs on the same machine are fine. I can put logs somewhere if there's interest. I can leave the OSD in that state if wanted. At least for a couple of days. |
2024-07-22T12:12:29.395Z | <Igor Fedotov> Hi sur5r! |
2024-07-22T12:12:46.214Z | <Igor Fedotov> It would be interesting to take a look at the log indeed, |
2024-07-22T12:12:52.059Z | <Igor Fedotov> Even two logs: |
2024-07-22T12:13:17.420Z | <Igor Fedotov> 1. The log around the first crash. |
2024-07-22T12:13:45.570Z | <Igor Fedotov> 2. The OSD startup log with debug-bluestore and debug-bluefs config params set to 20 |
2024-07-22T15:51:47.199Z | <Bill Scales> Its a read op that is reading more than one region:
offset = 512, length = 512 and
offset = 16K+ 512, length = 512
rbd is processing a discard request for offset = 513, length = 16K but because this is for a LUKs enabled volume it needs to read/write whole 512 byte sectors, so it is issuing this unusual read op to read the start and end sector to allow it to perform a read-modify-write type update.
The I/O exercisers for testing core rados code are not imaginative enough to generate this type of read op |
2024-07-22T16:05:46.700Z | <Alex Ainscow> How common is this style of operation? |
2024-07-22T16:06:11.392Z | <Alex Ainscow> In particular, do we need to optimise for it? |
2024-07-22T16:15:01.906Z | <Ilya Dryomov> It shouldn't be common at all |
2024-07-22T23:11:55.447Z | <Reily Siegel> Hello! I posted this question over in <#C1HFU4JK1|>, but haven't seen any response. Is it a more appropriate question for here, and if so, is there any way to recover from this failure mode and upgrade?
<https://ceph-storage.slack.com/archives/C1HFU4JK1/p1721498845344699> |
2024-07-22T23:21:47.693Z | <rzarzynski> replicating locally. `RBD_FEATURES=61` is a must |
2024-07-22T23:46:48.956Z | <rzarzynski> @Bill Scales is right:
``` -11> 2024-07-22T23:38:13.138+0000 7f7fc4577700 10 osd.0 pg_epoch: 21 pg[1.2s0( v 21'23 (0'0,21'23] local-lis/les=14/15 n=2 ec=14/14 lis/c=14/0 les/c/f=15/0/0 sis=14) [0,NONE,1]p0(0) r=0 lpr=14 crt=21'23 lcod
21'22 mlcod 21'22 active+undersized+degraded mbc={} trimq=[2~1]] do_osd_op 1:60a3bc56:::rbd_data.2.10321519e9c1.0000000000000001:head [read 512~512,read 16896~512]
...
-4> 2024-07-22T23:38:13.138+0000 7f7fc4577700 20 osd.0 pg_epoch: 21 pg[1.2s0( v 21'23 (0'0,21'23] local-lis/les=14/15 n=2 ec=14/14 lis/c=14/0 les/c/f=15/0/0 sis=14) [0,NONE,1]p0(0) r=0 lpr=14 crt=21'23 lcod
21'22 mlcod 21'22 active+undersized+degraded mbc={} trimq=[2~1]] get_min_want_to_read_shards before: offset 0 length 4096 want_to_read
-3> 2024-07-22T23:38:13.138+0000 7f7fc4577700 20 osd.0 pg_epoch: 21 pg[1.2s0( v 21'23 (0'0,21'23] local-lis/les=14/15 n=2 ec=14/14 lis/c=14/0 les/c/f=15/0/0 sis=14) [0,NONE,1]p0(0) r=0 lpr=14 crt=21'23 lcod
21'22 mlcod 21'22 active+undersized+degraded mbc={} trimq=[2~1]] get_min_want_to_read_shards after: offset 0 length 4096 want_to_read 0
-2> 2024-07-22T23:38:13.138+0000 7f7fc4577700 20 osd.0 pg_epoch: 21 pg[1.2s0( v 21'23 (0'0,21'23] local-lis/les=14/15 n=2 ec=14/14 lis/c=14/0 les/c/f=15/0/0 sis=14) [0,NONE,1]p0(0) r=0 lpr=14 crt=21'23 lcod
21'22 mlcod 21'22 active+undersized+degraded mbc={} trimq=[2~1]] get_min_want_to_read_shards before: offset 16384 length 4096 want_to_read 0
-1> 2024-07-22T23:38:13.143+0000 7f7fc4577700 -1 /home/rzarzynski/ceph1/src/osd/ECCommon.cc: In function 'static void ECCommon::ReadPipeline::get_min_want_to_read_shards(uint64_t, uint64_t, const ECUtil::str
ipe_info_t&, const std::vector<int>&, std::set<int>*)' thread 7f7fc4577700 time 2024-07-22T23:38:13.139271+0000
/home/rzarzynski/ceph1/src/osd/ECCommon.cc: 336: FAILED ceph_assert(want_to_read->size() == sinfo.get_data_chunk_count()) ``` |
2024-07-22T23:54:25.142Z | <rzarzynski> at the low-level: `get_min_want_to_read_shards` draws assumptions on contiguity of`want_to_read` to skip some iterations |