2024-11-15T07:53:50.989Z | <Yingxin Cheng> Rough performance measurements after implementing seastore partial reads, far from perfect, but might be interesting :) <https://github.com/ceph/ceph/pull/60654#issue-2640092642> |
2024-11-15T11:40:47.344Z | <Jose J Palacios-Perez> I need to disable the heuristic (stop criteria for the response latency tests): if the latency std deviation is too high, the test does not increase the IO load, hence ending the main loop. Despite that, 4k random read looks pretty good: the OSD CPU utilisation remains constant, the client FIO (using libRBD) becomes the bottleneck: this is a small (3OSD, 3 seastar reactor per OSD) balanced config:: https://files.slack.com/files-pri/T1HG3J90S-F0811FBCUD9/download/cyan_3osd_3react_bal_1procs_randread_iops_vs_lat_vs_cpu.png |
2024-11-15T11:51:20.334Z | <Jose J Palacios-Perez> FIO CPU and MEM utilisation resp.: (I'll coalesce all the msgr-workers in a single curve group): https://files.slack.com/files-pri/T1HG3J90S-F081E9RSGMP/download/fio_cyan_3osd_3react_bal_1procs_randread_top_cpu.png |
2024-11-15T11:51:20.336Z | <Jose J Palacios-Perez> https://files.slack.com/files-pri/T1HG3J90S-F080M3E9YKZ/download/fio_cyan_3osd_3react_bal_1procs_randread_top_mem.png |
2024-11-15T11:53:50.401Z | <Jose J Palacios-Perez> The following are for OSD CPU and MEM util. The former would be interesting to compare against an unbalanced configuration:: https://files.slack.com/files-pri/T1HG3J90S-F0811GULQ83/download/osd_cyan_3osd_3react_bal_1procs_randread_top_cpu.png |
2024-11-15T11:53:50.403Z | <Jose J Palacios-Perez> https://files.slack.com/files-pri/T1HG3J90S-F080M3NUA3Z/download/osd_cyan_3osd_3react_bal_1procs_randread_top_mem.png |
2024-11-15T13:25:00.675Z | <Brett Niver> interesting, not sure what to make of it yet, but interesting 🙂 |
2024-11-15T14:43:09.045Z | <Jose J Palacios-Perez> Update: Bill suggested that a better balance for improved performance would be in terms of OSDs, (rather than split the reactor from the same OSD into the available sockets). In other words, all the reactors of the same OSD should run in the same socket:
• OSD0: 3 reactors in the same socket 0: 0-3,
• OSD1: 3 reactors in socket 1: 28-30,
• OSD2: 3 reactors in socket 0: 4-6
What @Matan Breizman, @Yingxin Cheng are your thoughts please? |
2024-11-15T14:55:59.832Z | <Jose J Palacios-Perez> This is the acceptance criteria I mentioned above: I've disabled and re-run the tests, will think about reorganising my Python script to attempt the balance per OSD: https://files.slack.com/files-pri/T1HG3J90S-F080VR152TG/download/heuristic-response-curves.png |
2024-11-15T15:45:23.938Z | <Jose J Palacios-Perez> This is the acceptance criteria I mentioned above: I've disabled it and re-run the tests shortly, will think about reorganising my Python script to attempt the balance per OSD instead 🤔 |
2024-11-15T20:00:21.190Z | <> file_delete |
2024-11-15T20:01:38.271Z | <Jose J Palacios-Perez> I need to disable the heuristic (stop criteria for the response latency tests): if the latency std deviation is too high, the test does not increase the IO load, hence ending the main loop. Despite that, 4k random read looks pretty good: the OSD CPU utilisation remains constant, the client FIO (using libRBD) becomes the bottleneck: this is a small (3OSD, 3 seastar reactor per OSD) balanced config:
(I found an issue with this chart, showing wrong CPU util, amended .png below) |
2024-11-15T20:03:09.996Z | <Jose J Palacios-Perez> Amended chart: OSD CPU corrected:: https://files.slack.com/files-pri/T1HG3J90S-F08117S63A9/download/cyan_3osd_3react_bal_1procs_randread_iops_vs_lat_vs_cpu.png |