In other words, there’s a lot of unmeasured performance degradation that’s a side effect of doing many syscalls above and beyond the CPU time to enter/leave the kernel which itself has shrunk to be negligible. But there’s a reason high performance code is switching to io_uring to avoid that.
But I agree with the conclusion, system calls are still pretty fast compared to a lot of other things.
But I was clarifying because the wording could be taken as data/instruction cache and there generally isn’t a full flush of that just to enter/leave kernel.