upvote
> The Unified Memory pool is the “game changer”

M1 knocking from 2020.

Gamed changed, past tense, six years ago. This is catch-up.

reply
That was the main reason for the big hype around Memristors 15 years ago. High density, high speed persistent memory to completely remove the need for hdd/ssds, potentially even removing the need for external memory altogether. So frustrating that it still seems like we're a long ways from that becoming reality. There's some renewed interest in Memristors as they can simulate neural network connections in models, so maybe the funding will return for it.
reply
And here I am with 128GB Strix Halo longingly eyeing the Blackwell cards that spit tokens 10-20x the speed.

The question is ultimate shape of knowledge compression and bandwidth optimization at which we arrive I suppose.

reply
It’s also the reason, why you will never be able to repair or upgrade your computer in the future. From technological point of view these are indeed big advancements.

However, I couldn’t care less about faster CPU when: 1. It limits my ability to upgrade my system 2. Windows gets increasingly bloated and slower

reply
What is the difference between unified memory and shared memory?

Shared memory existed since the first CPU with an embedded GPU came to market and you could set in BIOS how much memory goes to what component.

I do have an opinion about how unified memory could be different, but I want a proper explanation.

reply
Shared memory of the past meant reserving a part of the memory for the GPU, which could then not be used or accessed by the CPU. If the CPU wanted to access something, it had to copy it from the GPU's section of the memory to its own. Unified memory means both just fully share the same memory.
reply
System RAM has much lower bandwidth and less predictable access. Notably, the transfer from system to GPU is very slow. About 30x slower. LLMs aren’t designed to queue or parallelise operations to account for this. They just become much slower.
reply
Memory safety is orthogonal to side-channels, and hardware-enforced isolation (e.g. IOMMU) is more powerful than compiler-enforced isolation (but both are good!)
reply
If this thing only has as much gpu bandwidth as the spark, it’s kinda pointles
reply
Unified memory is only a feature because NVidia so aggressively uses VRAM for market segmentation.

The 5090 ($2k MSRP but realistically $3-3.5k) is almost the same as the RTX 6000 Pro (~$10k). Same memory bandwidth (1800GB/s). Slightly different CUDA cores (21k vs 24k). Big difference? VRAM (32GB vs 96GB).

NVidia ultimately doesn't want to upset this segmentation so the RTX Spark will never undermine their other offerings. This is why I think Apple has a real market opportunity if they choose to embrace it.

reply
To this day I do not get why Intel doesn't just offer massive memory options for their cards. Just charge what it costs to add the extra memory, no upcharge, and they will never be able to keep up with demand. Cheap VRAM is enough to justify a lot of open source investment into challenging CUDA.
reply
Even low-VRAM cards are actually very useful for running the comparatively smaller dense layers in large local MoE models. This only requires transfering very small amounts of data across the PCIe bus (similar to pipeline parallelism) so it fits nicely around the existing bottlenecks on that hardware.
reply
I have so many questions… Since Apple already sells unified memory systems, what is the market opportunity you envision? Do you see Nvidia and Apple as competitors, and how? (And I’m not suggesting they’re not, necessarily, but I want to hear where you’re coming from, and they do have very different markets.) Hasn’t Apple used storage size (RAM & disk) for market segmentation for decades? And how does a machine with 128GB unified mem not potentially cut into some people’s reasons for wanting a 96GB GPU?
reply
I'm not the person you're replying to, but I wholeheartedly agree with them...

Quick background: doing AI inference requires three things. Lots of memory, lots of memory bandwidth, and of course plenty of compute that has access to that memory.

Quick reference: nVidia 5090 has 1,792 GB/sec bandwidth. 3090 gets about 1000 GB/sec. DGX Spark and AMD 395 whatever get about 275 GB/sec.

Apple M1 Max gets 400GB/sec, M5 Max gets 614GB/sec. Ultra variants get 2x that bandwidth, base variants get 1/2 that bandwidth. However... their compute is rather weak.

Right now, Apple's offerings are juuuuuust fast enough to run dense 27B models at usable speeds at like, 10% of the performance/watt of nVidia. They're world-leading general purpose CPUs but not killer GPUs.

By all accounts, these Windows PCs nVidia is touting seem to have DGX Spark like performance, which is less than impressive. Same with the upcoming AMD AI-oriented consumer stuff.

The other context here is that running your own AI at home is just starting to become feasible in terms of open model availability and the ability to run it at usable speeds. Many are interested in it for reasons of privacy, security, and cost certainty vs. buying tokens.

    Since Apple already sells unified memory systems, what 
    is the market opportunity you envision?
nVidia and AMD can't make their consumer offerings too good at AI, because that risks interfering with their higher-margin data center sales.

(And, let's face it. Even if nVidia did release a 6090 with 64-128GB of memory for an affordable price, consumers wouldn't get their hands on them anyway because people would just start filling data centers with them)

So.

Now you see Apple's opportunity, right? No data center sales to interfere with. No relationship with nVidia or AMD to worry about.

They could choose to make an absolute beast of a home AI machine. The M5 Ultra, if announced, might be that. It's admittedly a niche market, but people are already buying 64GB+ Macs faster than Apple can make them and they're fetching high prices on the used market as well.

The only real questions are if this market is even something Apple would find time to care about, and if they could secure enough DRAM to make a go at it. They are enormous obviously but they're feeling the RAM pinch just like everybody.

reply
Apple offers relatively affordable options for a high-memory workstation that uses unified memory. They previously offered 256/512GB Mac Studios (both discontinued). Because of this they can keep larger models in memory.

BUT you just can't compete with NVidia performance for LLM workloads (mostly inference) for two reasons:

1. The memory bandwidth just can't compete with a 5090 (1800GB/s). The best current Mac is ~900GB/s. That directly caps tokens/sec and might be manageable but there's another problem; and

2. The raw FLOPS just can't compete with even a 5090. It probably needs to natively support FP4/FP8 to at least maintain a number format parity with NVidia. But beside that, NVidia just has more raw FLOPS.

According to Google, an M5 Max does ~70 FP16 TFLOPS while a 5090 does 380. If Apple can close that gap to at least be competitive and also hold larger models in shared VRAM, that would be a competitive advantage and it would directly attack NVidia's market segmentation.

The Mac Studio last came out March last year. So we may get an update in Q3. Many are pinning their hopes on this. But it might not happen until next year. When it was released the M4 was the state of the art and it came with either the M4 Max or M3 Ultra (which, as I understand it, is basically 2 M3s stuck together, kind of). What people are hoping for is an M5 Ultra with >1000GB/s of memory bandwidth, ideally 200+ FP16 TFLOPS and hopefully FP4/FP4 support.

You can chain Mac Studios together into a cluster with TB5 too.

But it's reasonably likely that the next Mac Studio will be only incrementally better than the last generation.

reply
> 5090 ($2k MSRP but realistically $3-3.5k)

These days, more like >$4.1K (at least in the US).

reply
Intel was doing UMA with their i740 graphics in the late 90s. Codename TIMNA was cancelled, but they pioneered it and used it on their you/cpu chips as well as their breakthrough 810 chipset that dominated graphics market for a decade. It was despised because it wa ubiquitous and a low performing graphics engine but games had to accommodate it.

Funny that it is getting credit only now.

reply
yeah, you only see double digits in performance degradation from going from pcie 5 to 3 with a 5090 (at x16 speed), with everything else its like in the single digits area.
reply
And the thing we gamers forget is that we’re the outlier. We’re the edge case.

Most consumers will never really care about, let alone see, the difference in PCIe or memory bandwidth impacts from such a shift to unified memory pools. We might (being, at least in my case, a huge nerd), but I’m increasingly of the opinion that if modern blockbuster games are built for upscaling/reconstruction anyhow, then suddenly such sacrifices to performance seem acceptable relative to the gains in efficiency.

reply
Well I mean, the idea with games is it all fits in vram. You really don't want to be thrashing. It's that things are still so slow that they must be avoided entirely, no?

No copy unified memory will help with that but you do pay the read speed costs.

reply
gen3 is 16 years old.
reply
> (which is good for Rust adherents, I figure).

As a Rust adherent, please do not put words in our mouths or set up unrealistic expectations for other people by linking together concepts at a very shallow level.

Language level memory safety has no answer for hardware security flaws which is what side channel attacks are. No programming language can provide memory privacy if another chip in your machine can read your memory. Just like no programming language can protect your application from a kernel vulnerability of the kernel it’s running on.

reply
Damn. That wasn’t my intention at all, I was just pointing out that Rust has another reason to see wider adoption vis a vis the usual Valley advertising bullshit of deliberately conflating hardware security with software security. I personally give no fucks what something is written in, only that it’s written well enough that I don’t have to twist arms or babysit yet another sloppy piece of code in my enterprise.
reply
But... it's rust.
reply