ceph - ceph-devel - 2024-07-22

Timestamp (UTC)Message
2024-07-22T11:47:05.649Z
<sur5r> Hi all! One of my OSDs fails to start after a reboot. Gist is: "bluestore::NCB::__restore_allocator::No Valid allocation info on disk (empty file)" -> "_init_alloc::NCB::restore_allocator() failed! Run Full Recovery from ONodes (might take a while) ..." -> "HybridAllocator init_rm_free lambda Uexpected extent:  0x3b000~10000" -> "./src/os/bluestore/HybridAllocator.cc: 178: FAILED ceph_assert(false)"
2024-07-22T11:47:18.417Z
<sur5r> Does that ring any bells? My search  came up empty so far.
2024-07-22T11:49:28.413Z
<sur5r> Other OSDs on the same machine are fine. I can put logs somewhere if there's interest. I can leave the OSD in that state if wanted. At least for a couple of days.
2024-07-22T12:12:29.395Z
<Igor Fedotov> Hi sur5r!
2024-07-22T12:12:46.214Z
<Igor Fedotov> It would be interesting to take a look at the log indeed,
2024-07-22T12:12:52.059Z
<Igor Fedotov> Even two logs:
2024-07-22T12:13:17.420Z
<Igor Fedotov> 1. The log around the first crash.
2024-07-22T12:13:45.570Z
<Igor Fedotov> 2. The OSD startup log with debug-bluestore and debug-bluefs config params set to 20
2024-07-22T15:51:47.199Z
<Bill Scales> Its a read op that is reading more than one region:

offset = 512, length = 512 and
offset = 16K+ 512, length = 512

rbd is processing a discard request for offset = 513, length = 16K but because this is for a LUKs enabled volume it needs to read/write whole 512 byte sectors, so it is issuing this unusual read op to read the start and end sector to allow it to perform a read-modify-write type update.

The I/O exercisers for testing core rados code are not imaginative enough to generate this type of read op
2024-07-22T16:05:46.700Z
<Alex Ainscow> How common is this style of operation?
2024-07-22T16:06:11.392Z
<Alex Ainscow> In particular, do we need to optimise for it?
2024-07-22T16:15:01.906Z
<Ilya Dryomov> It shouldn't be common at all
2024-07-22T23:11:55.447Z
<Reily Siegel> Hello! I posted this question over in <#C1HFU4JK1|>, but haven't seen any response. Is it a more appropriate question for here, and if so, is there any way to recover from this failure mode and upgrade?
<https://ceph-storage.slack.com/archives/C1HFU4JK1/p1721498845344699>
2024-07-22T23:21:47.693Z
<rzarzynski> replicating locally. `RBD_FEATURES=61` is a must
2024-07-22T23:46:48.956Z
<rzarzynski> @Bill Scales is right:

```  -11> 2024-07-22T23:38:13.138+0000 7f7fc4577700 10 osd.0 pg_epoch: 21 pg[1.2s0( v 21'23 (0'0,21'23] local-lis/les=14/15 n=2 ec=14/14 lis/c=14/0 les/c/f=15/0/0 sis=14) [0,NONE,1]p0(0) r=0 lpr=14 crt=21'23 lcod 
21'22 mlcod 21'22 active+undersized+degraded mbc={} trimq=[2~1]] do_osd_op 1:60a3bc56:::rbd_data.2.10321519e9c1.0000000000000001:head [read 512~512,read 16896~512]
...
    -4> 2024-07-22T23:38:13.138+0000 7f7fc4577700 20 osd.0 pg_epoch: 21 pg[1.2s0( v 21'23 (0'0,21'23] local-lis/les=14/15 n=2 ec=14/14 lis/c=14/0 les/c/f=15/0/0 sis=14) [0,NONE,1]p0(0) r=0 lpr=14 crt=21'23 lcod
21'22 mlcod 21'22 active+undersized+degraded mbc={} trimq=[2~1]] get_min_want_to_read_shards before: offset 0 length 4096 want_to_read                                                                            
    -3> 2024-07-22T23:38:13.138+0000 7f7fc4577700 20 osd.0 pg_epoch: 21 pg[1.2s0( v 21'23 (0'0,21'23] local-lis/les=14/15 n=2 ec=14/14 lis/c=14/0 les/c/f=15/0/0 sis=14) [0,NONE,1]p0(0) r=0 lpr=14 crt=21'23 lcod
21'22 mlcod 21'22 active+undersized+degraded mbc={} trimq=[2~1]] get_min_want_to_read_shards after: offset 0 length 4096 want_to_read 0                                                                           
    -2> 2024-07-22T23:38:13.138+0000 7f7fc4577700 20 osd.0 pg_epoch: 21 pg[1.2s0( v 21'23 (0'0,21'23] local-lis/les=14/15 n=2 ec=14/14 lis/c=14/0 les/c/f=15/0/0 sis=14) [0,NONE,1]p0(0) r=0 lpr=14 crt=21'23 lcod
21'22 mlcod 21'22 active+undersized+degraded mbc={} trimq=[2~1]] get_min_want_to_read_shards before: offset 16384 length 4096 want_to_read 0                                                                      
    -1> 2024-07-22T23:38:13.143+0000 7f7fc4577700 -1 /home/rzarzynski/ceph1/src/osd/ECCommon.cc: In function 'static void ECCommon::ReadPipeline::get_min_want_to_read_shards(uint64_t, uint64_t, const ECUtil::str
ipe_info_t&, const std::vector<int>&, std::set<int>*)' thread 7f7fc4577700 time 2024-07-22T23:38:13.139271+0000                                                                                                   
/home/rzarzynski/ceph1/src/osd/ECCommon.cc: 336: FAILED ceph_assert(want_to_read->size() == sinfo.get_data_chunk_count())   ```
2024-07-22T23:54:25.142Z
<rzarzynski> at the low-level: `get_min_want_to_read_shards` draws assumptions on contiguity of`want_to_read` to skip some iterations

Any issue? please create an issue here and use the infra label.