undefined

points

[-]

Heck, we saw crazy performance degradation with redis when its memory usage exceeded a single NUMA block. Not much to be done about that at the k8s level when redis is single-threaded. Have to be super conscious of the underlying hardware at that point.

by gopalv2 hours ago|

prev|

[-]

> Kubernetes deployed on a server with hundreds of CPU cores

Was that a Power9 or some sort of IBM machine?

Not all NUMA is the same, ccNUMA from the Intel is a different beast from the PPC version of the same.

by re-thc8 hours ago|

prev|

[-]

Is this on AMD? I wonder if it's all to do with NUMA or their CCD architecture etc (well these days Intel and everyone also does it to some extent).

by Twirrim6 hours ago|

parent|

[-]

Intel suffers just as much when NUMA enters the picture, even prior to CCD style architecture. That extra latency hop across to the other core to get at memory is absolutely crippling, especially in a hot loop. It requires very careful handling, while being this kind of invisible element (unless you know to look for it, nothing will draw your attention to it)

by toast08 hours ago|

parent|

prev|

[-]

Hundreds of cores is likely two sockets and so you've got NUMA there.

Scaling to large core counts has a lot of gotchas.

by 8 hours ago|

parent|

prev|

[-]

deleted

by CarRamrod7 hours ago|

prev|

[-]

There is one instance where the NUMA performance never disappoints: https://www.youtube.com/watch?v=Cqd1Gvq-RBY

by drunkboxer5 hours ago|

parent|

[-]

There are in fact two instances https://www.youtube.com/watch?v=ZBKm1MBsTbk