ceph - crimson - 2024-11-19

Timestamp (UTC)Message
2024-11-19T11:05:02.634Z
<Jose J Palacios-Perez> Bad news, the comparison of results  suggest that the CPU balance does not improve performance :cry:
 (response curves IOPs vs latency, OSD CPU util and OSD MEM util, resp.): https://files.slack.com/files-pri/T1HG3J90S-F081254KG87/download/cyan_3osd_3react_bal_vs_unbal_4krandread_iops_vs_lat.png
2024-11-19T11:05:02.644Z
<Jose J Palacios-Perez> https://files.slack.com/files-pri/T1HG3J90S-F081254PQH5/download/cyan_3osd_3react_bal_vs_unbal_4krandread_osd_cpu.png
2024-11-19T11:05:02.647Z
<Jose J Palacios-Perez> https://files.slack.com/files-pri/T1HG3J90S-F081A2T0V8W/download/cyan_3osd_3react_bal_vs_unbal_4krandread_osd_mem.png
2024-11-19T11:08:15.352Z
<Jose J Palacios-Perez> • does not seem any significant performance difference between the configurations
•  the default (unbalanced CPU core allocation) seems to be using less memory that the other balanced configurations, which does not make sense to me since they are the same number of OSDs and reactors, why would they need to allocate more? I will look at the raw data to rule out a bug in my scripts 
2024-11-19T11:09:55.219Z
<Jose J Palacios-Perez> FIO util  looks consistent across the tests:: https://files.slack.com/files-pri/T1HG3J90S-F081DRW2CMB/download/cyan_3osd_3react_bal_vs_unbal_4krandread_fio_cpu.png
2024-11-19T11:09:55.220Z
<Jose J Palacios-Perez> https://files.slack.com/files-pri/T1HG3J90S-F081VC79Z5X/download/cyan_3osd_3react_bal_vs_unbal_4krandread_fio_mem.png
2024-11-19T11:12:24.335Z
<Jose J Palacios-Perez> • does not seem any significant performance difference between the configurations
•  the default (unbalanced CPU core allocation) seems to be using less memory that the other balanced configurations, which does not make sense to me since they are the same number of OSDs and reactors, why would they need to allocate more? I will look at the raw data to rule out a bug in my scripts. Notice that for 3 OSD, in the default configuration all are running on the same CPU socket, whereas in the new balanced configuration, two OSDs (0 and 2) are allocated on CPU socket0, and only OSD 1 is allocated to CPU socket 1, From the PR Matan posted (<https://github.com/ceph/ceph/pull/52404>) I can only guess that in the new balanced configuration further memory copies across CPU sockets take place
2024-11-19T11:12:25.849Z
<Jose J Palacios-Perez> • does not seem any significant performance difference between the configurations
•  the default (unbalanced CPU core allocation) seems to be using less memory that the other balanced configurations, which does not make sense to me since they are the same number of OSDs and reactors, why would they need to allocate more? I will look at the raw data to rule out a bug in my scripts. Notice that for 3 OSD, in the default configuration all are running on the same CPU socket, whereas in the new balanced configuration, two OSDs (0 and 2) are allocated on CPU socket0, and only OSD 1 is allocated to CPU socket 1, From the PR Matan posted (<https://github.com/ceph/ceph/pull/52404>) I can only guess that in the new balanced configuration further memory copies across CPU sockets take place
2024-11-19T11:16:32.216Z
<Jose J Palacios-Perez> @Matan Breizman: bad news, the comparison of results  suggest that the CPU balance does not improve performance 😢
 (response curves IOPs vs latency, OSD CPU util and OSD MEM util, resp.)
2024-11-19T11:48:37.894Z
<Matan Breizman> > suggest that the CPU balance does not improve performance
I don't think that we can infer that from the last run. IIUC, each OSD had 3 reactor threads. It's highly possible, given the machine has enough cores and it
2024-11-19T11:49:13.330Z
<Matan Breizman> > suggest that the CPU balance does not improve performance
I don't think that we can infer that from the last run. IIUC, each OSD had 3 reactor threads. It's highly possible, given the machine has enough cores and it's not busy with other workloads, that the "unbalanced"  run resulted in roughly the same allocation as the balanced one did.
2024-11-19T11:50:29.051Z
<Matan Breizman> The real difference would most likely be once there are enough reactor thread and some might run on sibling cores. Unlike in the "balanced" version where we would restrict that
2024-11-19T11:52:45.450Z
<Matan Breizman> Can we separate the visualizer from the balancing script? I think that it has value on it's own
2024-11-19T15:41:48.057Z
<Jose J Palacios-Perez> Yes, they are separate scripts 👍 Unfortunately  I deleted the previous socket based balancing function, I shall try remember it back and have it as a separate (albeit experimental) function
2024-11-19T15:50:46.416Z
<Jose J Palacios-Perez> Yes, I was also wondering whether a larger configuration (eg. 8 OSD, each with 4 or 5 reactors) might show a different picture? Since the machine will be on a higher load. The above results are for a 3 OSD 3 reactors, I'll restore the old balancing as a separate function and trigger the same tests for the  larger configuration asap.
2024-11-19T15:56:41.101Z
<Jose J Palacios-Perez> This is the default CPU allocation for a 3 OSD 3 reactor (cyanstore): all the reactors use a single socket: https://files.slack.com/files-pri/T1HG3J90S-F081J7T196E/download/cyan_3osd_3reac_default.png
2024-11-19T16:09:00.960Z
<Jose J Palacios-Perez> At least in terms of resource consumption, my balancing approach (socket based) is not the worse

Any issue? please create an issue here and use the infra label.