upvote
Filesystem stuff tends to be slower than average syscalls because of all the locks and complicated traversals needed. If this is using stat instead of fstat then it’s also going through the VFS layer - repeated calls likely go through the cache fast path for path resolution but accessing the stat structure. There’s also hidden costs in that number like atomic accesses that need to acquire cache line locks that are going to cause hidden contention for other processes on the CPU + the cache dirtying from running kernel code and then subsequently having to repopulate it when leaving all of which adds contended L3/RAM pressure.

In other words, there’s a lot of unmeasured performance degradation that’s a side effect of doing many syscalls above and beyond the CPU time to enter/leave the kernel which itself has shrunk to be negligible. But there’s a reason high performance code is switching to io_uring to avoid that.

reply
Oh cool, so using io uring plus pragma data version would actually beat stat on Linux holistically speaking? The stat choice was all about cross platform consistency over inotify speed. But syscalls overwhelm can be real.
reply
“Beat” is all relative. It depends on load and how frequently you’re doing it, but generally yes. But if you’re doing io_uring, you may as well use inotify because you’re in the platform specific API anyway as that’s the biggest win because you’re moving from polling to change detection which is less overhead and lower latency. Inotify can be accessed by io_uring and there may even be cross-platform libraries for your language that give you a consistent file watcher interface (although probably not optimally over io_uring). Whether it’s actually worth it is hard as I don’t know what problem you’re trying to solve, but the super lowest overhead looks like inotify+iouring (it also has the lowest latency)
reply
If you're interested you can use kqueue on FreeBSD and Darwin to watch the inode for changes. Faster than a syscall, especially if all you need is a wakeup when it changes.
reply
That’s ignoring the other costs of syscalls like evicting your stuff from the CPU caches.

But I agree with the conclusion, system calls are still pretty fast compared to a lot of other things.

reply
Small correction on ambiguous wording - syscalls do not evict all your stuff from CPU caches. It just has to page in whatever is needed for kernel code/data accessed by the call, but that’s no different from if it was done in process as a normal function call.
reply
Depending on implementation details of your CPU and OS, the syscall path may need to flush various auxillary caches (like one or more TLBs) to prevent speculation attacks, which may put additional "drag" on your program after syscall return.
reply
Correct but you’d also still have that drag just from the kernel dirtying those caches in the first place.

But I was clarifying because the wording could be taken as data/instruction cache and there generally isn’t a full flush of that just to enter/leave kernel.

reply