upvote
I'm not sure everyone uses the terms consistently, but the difference is that the old "shared" memory was reserving a section to act as VRAM under the control of the GPU, ignored by the OS. The CPU ran the same kind of code pretending there is a "bus transfer" between host memory and graphics memory.

In unified memory, all the memory is host memory and data can go from program to GPU with zero copy movements. The addresses of buffers can be shared via appropriate MMU translation support, so that the application and graphics subsystem are communicating effectively through the basic RAM cache coherency protocols over the same buffers.

Edit to add: Aside from the zero copy transfer potential, it also means dynamic allocation strategies can shift the balance between host and graphics allocations on the fly. Individual image and message buffers can be allocated on the fly instead of setting a static split between the two worlds.

reply
That's my understanding, or, maybe a better word would be "guess". The CPU telling the GPU: this is your memory now.
reply
in modern days, it should be the same, just different marketing teams push different terms.

but if you are talking about the old days. shared memory of the old days, even with one physical RAM pool, the system enforces logical boundaries. a partitioning where the cpu has its partition of the ram and address space, and the gpu has its partition of the same ram and address space. to transfer data you need to copy from the gpu space to cpu space and vice versa.

with unified memory, both cpu and gpu shared the same ram and the same virtua space address. there are no partitioning and zero copying between the two since the share the virtual space addresses.

reply
For these in specific, they appear basically transparently to the GPU. There's a lot of software/firmware stuff for this, but also a different hardware architecture - while the RAM is on the CPU die, the nvlink-c2c gives it extremely low latency and 600GB/s bandwidth between the GPU and CPU.
reply
Shared memory of the past meant reserving a part of the memory for the GPU, which could then not be used or accessed by the CPU. If the CPU wanted to access something, it had to copy it from the GPU's section of the memory to its own. Unified memory means both just fully share the same memory.
reply
deleted
reply
Marketing, mostly? But perhaps also more flexibility with how much memory the GPU can directly access without reserving it.
reply
No. Let’s define terms, as others have pointed out they’re not perfect.

Unified memory is what Apple is doing, other phones do, and many low end built in GPUs have done in PCs for ages. There is only one physical memory pool. Both the CPU and GPU can access it at full speed.

This means no copying between pools of memory. No speed penalty accessing the CPU memory from GPU or vice versa. If the GPU only needs 2 GB to draw the desktop it only uses 2 GB of the pool. Or it can use 45 GB if it needs it and the CPU doesn’t. But all memory has to be the same speed, and that ain’t cheap given how fast GPUs like things. I don’t know if expandable memory is possible, and they use the same bus do they compete for bandwidth. Seems theoretically easier to program for to me.

The opposite is what’s been common in graphics cards since the 2D era. CPU and GPU have their own memory and can talk over PCI/AGP/PCI-E. This is what I think they mean by shared memory, if it’s not what’s the point in touting unified?

In this model if the GPU uses 2 GB of its 12 GB total, the other 10 isn’t available to the OS at full speed and I’m not aware of any operating systems that would use it for programs/cache by default. If the GPU needs 45 GB… too bad. You have to page things in and out of GPU memory over the much slower system bus. Starting a game means loading assets into main memory then transferring them to the GPU (newer tech can accelerate this). But the CPU can have slower memory than the GPU saving money. Memory expansion on the CPU side easy. And the CPU saturating its memory bus has no effect on the speed of the GPU memory bus because it’s physically separate. More complicated memory model but it’s the one everyone uses used to.

Which is better is a matter of opinion and workload needs.

reply
Yes, I know there is an actual difference vs. dedicated GPUs with their own VRAM. I say it's marketing because Apple popularized the unified memory term even though, as you said, it existed in iGPUs long before Apple Silicon and was called shared GPU memory.

> I don’t know if expandable memory is possible

It technically is. These new systems (mostly) get their high bandwidth by using more channels (wider bus) of normal RAM modules. A system that has LPCAMM2 sockets should allow using the same LPDDR5X memory but you'd need a socket per two channels. A typical PC only supports two channels so having four (two sockets) would double the bandwidth.

reply
System RAM has much lower bandwidth and less predictable access. Notably, the transfer from system to GPU is very slow. About 30x slower. LLMs aren’t designed to queue or parallelise operations to account for this. They just become much slower.
reply