undefined

points

[-]

On M5 Pro/Max the memory is actually just attached straight to the GPU die. CPU accesses memory through the die-to-die bridge. I don't see the difference between that and a pure GPU from a memory connectivity point of view.

Wrt inference servers: sure, it's not cost-effective to have such a huge CPU die and a bunch of media accelerators on the GPU die if you just care about raw compute for inference and training. Apple SoCs are not tuned for that market, nor do they sell into it. I'm not building a datacentre, I'm trying to run inference on my home hardware that I also want to use for other things.

by zozbot23415 hours ago|

prev|

[-]

Industrial Scale Inference is moving towards LPDDR memory (alongside HBM), which is essentially what "Unified Memory" is.

by 0x45711 hours ago|

parent|

[-]

> which is essentially what "Unified Memory" is.

Unified memory is when CPU and GPU can reference the same memory address without things being copied (CUDA allows you to write code as if it was unified even if it's not, so that doesn't count, but HMM does count[1])

That is all. What technology is underneath is hardware detail. Unified memory on macs lets you put something into a memory, then do some computation on it with CPU, ANE, ANA, Metal Shaders. All without copying anything.

DGX Spark also has unified memory.

[1]: https://docs.nvidia.com/cuda/cuda-programming-guide/02-basic...

by bigyabai15 hours ago|

parent|

prev|

[-]

LPDDR is LPDDR. There's nothing "unified" about it architecturally.

by rcxdude7 hours ago|

prev|

[-]

Unified Memory is mainly how consumer hardware has enough RAM accessible by the GPU to run larger models, because otherwise the market segmentation jacks up the price substantially.

by bigyabai6 hours ago|

parent|

[-]

UMA removes the PCIe bottleneck and replaces it with a memory controller + bandwidth bottleneck. For most high-performance GPUs, that would be a direct downgrade.

by 5 hours ago|

parent|

[-]

deleted