upvote
Heck, we saw crazy performance degradation with redis when its memory usage exceeded a single NUMA block. Not much to be done about that at the k8s level when redis is single-threaded. Have to be super conscious of the underlying hardware at that point.
reply
> Kubernetes deployed on a server with hundreds of CPU cores

Was that a Power9 or some sort of IBM machine?

Not all NUMA is the same, ccNUMA from the Intel is a different beast from the PPC version of the same.

reply
Is this on AMD? I wonder if it's all to do with NUMA or their CCD architecture etc (well these days Intel and everyone also does it to some extent).
reply
Intel suffers just as much when NUMA enters the picture, even prior to CCD style architecture. That extra latency hop across to the other core to get at memory is absolutely crippling, especially in a hot loop. It requires very careful handling, while being this kind of invisible element (unless you know to look for it, nothing will draw your attention to it)
reply
Hundreds of cores is likely two sockets and so you've got NUMA there.

Scaling to large core counts has a lot of gotchas.

reply
deleted
reply
There is one instance where the NUMA performance never disappoints: https://www.youtube.com/watch?v=Cqd1Gvq-RBY
reply