2024-11-19T11:05:02.634Z | <Jose J Palacios-Perez> Bad news, the comparison of results suggest that the CPU balance does not improve performance :cry:
(response curves IOPs vs latency, OSD CPU util and OSD MEM util, resp.): https://files.slack.com/files-pri/T1HG3J90S-F081254KG87/download/cyan_3osd_3react_bal_vs_unbal_4krandread_iops_vs_lat.png |
2024-11-19T11:05:02.644Z | <Jose J Palacios-Perez> https://files.slack.com/files-pri/T1HG3J90S-F081254PQH5/download/cyan_3osd_3react_bal_vs_unbal_4krandread_osd_cpu.png |
2024-11-19T11:05:02.647Z | <Jose J Palacios-Perez> https://files.slack.com/files-pri/T1HG3J90S-F081A2T0V8W/download/cyan_3osd_3react_bal_vs_unbal_4krandread_osd_mem.png |
2024-11-19T11:08:15.352Z | <Jose J Palacios-Perez> • does not seem any significant performance difference between the configurations
• the default (unbalanced CPU core allocation) seems to be using less memory that the other balanced configurations, which does not make sense to me since they are the same number of OSDs and reactors, why would they need to allocate more? I will look at the raw data to rule out a bug in my scripts |
2024-11-19T11:09:55.219Z | <Jose J Palacios-Perez> FIO util looks consistent across the tests:: https://files.slack.com/files-pri/T1HG3J90S-F081DRW2CMB/download/cyan_3osd_3react_bal_vs_unbal_4krandread_fio_cpu.png |
2024-11-19T11:09:55.220Z | <Jose J Palacios-Perez> https://files.slack.com/files-pri/T1HG3J90S-F081VC79Z5X/download/cyan_3osd_3react_bal_vs_unbal_4krandread_fio_mem.png |
2024-11-19T11:12:24.335Z | <Jose J Palacios-Perez> • does not seem any significant performance difference between the configurations
• the default (unbalanced CPU core allocation) seems to be using less memory that the other balanced configurations, which does not make sense to me since they are the same number of OSDs and reactors, why would they need to allocate more? I will look at the raw data to rule out a bug in my scripts. Notice that for 3 OSD, in the default configuration all are running on the same CPU socket, whereas in the new balanced configuration, two OSDs (0 and 2) are allocated on CPU socket0, and only OSD 1 is allocated to CPU socket 1, From the PR Matan posted (<https://github.com/ceph/ceph/pull/52404>) I can only guess that in the new balanced configuration further memory copies across CPU sockets take place |
2024-11-19T11:12:25.849Z | <Jose J Palacios-Perez> • does not seem any significant performance difference between the configurations
• the default (unbalanced CPU core allocation) seems to be using less memory that the other balanced configurations, which does not make sense to me since they are the same number of OSDs and reactors, why would they need to allocate more? I will look at the raw data to rule out a bug in my scripts. Notice that for 3 OSD, in the default configuration all are running on the same CPU socket, whereas in the new balanced configuration, two OSDs (0 and 2) are allocated on CPU socket0, and only OSD 1 is allocated to CPU socket 1, From the PR Matan posted (<https://github.com/ceph/ceph/pull/52404>) I can only guess that in the new balanced configuration further memory copies across CPU sockets take place |
2024-11-19T11:16:32.216Z | <Jose J Palacios-Perez> @Matan Breizman: bad news, the comparison of results suggest that the CPU balance does not improve performance 😢
(response curves IOPs vs latency, OSD CPU util and OSD MEM util, resp.) |
2024-11-19T11:48:37.894Z | <Matan Breizman> > suggest that the CPU balance does not improve performance
I don't think that we can infer that from the last run. IIUC, each OSD had 3 reactor threads. It's highly possible, given the machine has enough cores and it |
2024-11-19T11:49:13.330Z | <Matan Breizman> > suggest that the CPU balance does not improve performance
I don't think that we can infer that from the last run. IIUC, each OSD had 3 reactor threads. It's highly possible, given the machine has enough cores and it's not busy with other workloads, that the "unbalanced" run resulted in roughly the same allocation as the balanced one did. |
2024-11-19T11:50:29.051Z | <Matan Breizman> The real difference would most likely be once there are enough reactor thread and some might run on sibling cores. Unlike in the "balanced" version where we would restrict that |
2024-11-19T11:52:45.450Z | <Matan Breizman> Can we separate the visualizer from the balancing script? I think that it has value on it's own |
2024-11-19T15:41:48.057Z | <Jose J Palacios-Perez> Yes, they are separate scripts 👍 Unfortunately I deleted the previous socket based balancing function, I shall try remember it back and have it as a separate (albeit experimental) function |
2024-11-19T15:50:46.416Z | <Jose J Palacios-Perez> Yes, I was also wondering whether a larger configuration (eg. 8 OSD, each with 4 or 5 reactors) might show a different picture? Since the machine will be on a higher load. The above results are for a 3 OSD 3 reactors, I'll restore the old balancing as a separate function and trigger the same tests for the larger configuration asap. |
2024-11-19T15:56:41.101Z | <Jose J Palacios-Perez> This is the default CPU allocation for a 3 OSD 3 reactor (cyanstore): all the reactors use a single socket: https://files.slack.com/files-pri/T1HG3J90S-F081J7T196E/download/cyan_3osd_3reac_default.png |
2024-11-19T16:09:00.960Z | <Jose J Palacios-Perez> At least in terms of resource consumption, my balancing approach (socket based) is not the worse |