2024-06-26T04:48:22.054Z | <Venky Shankar> You could attach the logs in the tracker @Erich Weiler |
2024-06-26T11:19:00.794Z | <Dhairya Parmar> i have the mds log downloaded from them |
2024-06-26T11:51:06.048Z | <Dhairya Parmar> @Venky Shankar |
2024-06-26T11:52:25.606Z | <Dhairya Parmar> is it possible to dump `pre_allocated` inos set using ceph-dencoder? |
2024-06-26T12:57:21.660Z | <Dhairya Parmar> @Kotresh H R from your today's status
> I found that's because of the OSD full of metadata pool and data pool. Because of this fs connection was hung during mount.
Usually client would bail out with the `Client::_handle_full_flag` , if it hung looks like the osd full/nearfull/backfillfull ratio was closer to defaults? |
2024-06-26T13:01:25.519Z | <Dhairya Parmar> @Leonid Usov <https://github.com/ceph/ceph/pull/58281> |
2024-06-26T13:01:33.682Z | <Dhairya Parmar> do you think pr is in the right direction? |
2024-06-26T13:02:38.305Z | <Dhairya Parmar> at least from my runs, that was the case. since the metadata pool is full, mds starts reporting slow metadata io, client can't trim the cache and the mount is hung |
2024-06-26T13:03:16.530Z | <Dhairya Parmar> but if you reduce to some fair values like
```mon osd nearfull ratio: 0.6
mon osd backfillfull ratio: 0.6
mon osd full ratio: 0.7```
it works fine and the client should bail out |
2024-06-26T13:03:26.759Z | <Kotresh H R> It was default |
2024-06-26T13:03:36.970Z | <Kotresh H R> in vstart cluster |
2024-06-26T13:03:40.555Z | <Dhairya Parmar> ok so yes it should be these values then |
2024-06-26T13:07:27.402Z | <Leonid Usov> does that compile 🙂 |
2024-06-26T13:07:28.225Z | <Leonid Usov> ? |
2024-06-26T13:08:18.601Z | <Dhairya Parmar> haven' tried, this is just to get an initial opinion from you whether my approach is correct |
2024-06-26T13:08:40.981Z | <Leonid Usov> it looks like in the right direction, yes |
2024-06-26T13:08:43.513Z | <Leonid Usov> I’ll add comments |
2024-06-26T13:09:05.014Z | <Dhairya Parmar> okay |
2024-06-26T13:10:29.069Z | <Dhairya Parmar> AFAICK, the behaviour is quite random when the ratios are default. If you start the vstart with setting these values
```index 59a3798744d..e86a820c190 100755
--- a/src/vstart.sh
+++ b/src/vstart.sh
@@ -804,10 +804,11 @@ prepare_conf() {
[global]
fsid = $(uuidgen)
- osd failsafe full ratio = .99
- mon osd full ratio = .99
- mon osd nearfull ratio = .99
- mon osd backfillfull ratio = .99
+ osd failsafe full ratio = 1.0
+ osd mon report interval = 5
+ mon osd full ratio = 0.7
+ mon osd nearfull ratio = 0.6
+ mon osd backfillfull ratio = 0.6
mon_max_pg_per_osd = ${MON_MAX_PG_PER_OSD:-1000}
erasure code dir = $EC_PATH
plugin dir = $CEPH_LIB
@@ -866,10 +867,11 @@ EOF
bluestore_spdk_mem = 2048"
else
BLUESTORE_OPTS=" bluestore block db path = $CEPH_DEV_DIR/osd\$id/block.db.file
- bluestore block db size = 1073741824
+ bluestore block db size = 53687091
bluestore block db create = true
bluestore block wal path = $CEPH_DEV_DIR/osd\$id/block.wal.file
- bluestore block wal size = 1048576000
+ bluestore block wal size = 52428800
+ bluestore block size = 5368709120
bluestore block wal create = true"
if [ ${#block_devs[@]} -gt 0 ] || \
[ ${#bluestore_db_devs[@]} -gt 0 ] || \```
you'd get consistent results |
2024-06-26T13:42:23.048Z | <Kotresh H R> Yeah, you are right. but my intention was to test the clone_progress PR and review. So didn't notice the storage space and kept on creating clones, which resulted in the hung. |
2024-06-26T14:05:49.465Z | <Erich Weiler> The logs are a 300 MB tarball - would it allow such a large attachment? |
2024-06-26T14:06:06.602Z | <Erich Weiler> The logs are a 300 MB tarball - would it allow such a large attachment @Venky Shankar? |
2024-06-26T14:07:07.409Z | <Erich Weiler> Can anyone upload items or update an issue in redmine there? |
2024-06-26T16:11:20.725Z | <Dhairya Parmar> oh okay. if that's a repetitive workload, i'd suggest you to tune in those values otherwise you're bound to land in such scenario. |