After careful reading I'm surprised how small IRQ squares build up 30%. Should search for interrupts when I inspect our flamegraphs next time.
Edit: I wrote about that setup and other Linux/PCIe root complex topology issues I hit back in 2021:
I think your test had 10 980 Pros, which were probably around $120 each at the time (~$1200 total). SSDs are wildly more expensive now, but even if you spend $500 each, it's nowhere close to EBS.
It's apples vs oranges, but sometimes you just want fruit.
You suggest a very interesting measurements. I will keep it in my mind and try during next experiments. Wish I have read this before to apply during the past runs :)