AMD's Ryzen 9 9950X3D2 Dual Edition crams 208MB of cache into a single chip

[-]

> Now it's listed at $4k...

You can buy 128GB of DDR5-6000 with a 9950X3D (not this newest X2 version, but still a $699 CPU) and a motherboard and a case for $2800 right now: https://www.newegg.com/Product/ComboDealDetails?ItemList=Com...

If you don't need 128GB, there are quality 64GB kits for under $700 on Newegg right now, which is cheaper than this CPU.

If someone needs to build something now and can wait to upgrade RAM in a year or two, 32GB kits are in the $370 range.

I don't like this RAM price spike either, but in the context of building a high-end system with a 16-core flagship CPU like this and probably an expensive GPU, it's still reasonable to build a system. If you must have 128GB of RAM it can be done with bundles like the one I linked above but I'd recommend waiting at least 6 months if you can. There are signs that prices are falling now that panic-buying has started to trail off.

128GB of RAM should not cost $4K even in this market.

by adrian_b11 hours ago|

[-]

$2800 is still a huge price in comparison with the last year.

Last summer, a 9950X3D + motherboard + cooler + 128 GB DRAM + VAT sales taxes was the equivalent of $1400 in Europe, where I live.

That's half of your quoted price. That was without case and PSU, but adding e.g. $200 for those would not change much.

by alias_neo10 hours ago|

[-]

In January I upgraded my desktop, 9950X3D £600, 64GB DDR5-6000 £600, MSI MAG Tomahawk X870E £300, Samsung 990 Pro 4TB £350, Asus Prime 9070XT £580. I spent a another £250 on PSU and cooler and reused my case (Phanteks Evolv Enthoo TG, beautiful case but horrible cooling. Will cut some holes in it and if it doesnt work out look for something with more airflow).

The RAM price was already inflated at that time, and the same kit is now £800, but in October or earlier last year I'd have saved possibly the cost of the CPU/GPU on the whole thing, but now it's be about the cost of a CPU/GPU more expensive.

On a side note for anyone not aware, 9950X3D isn't the best choice for pure gaming, 9850X3D is cheaper and marginally better, also I went with 2 sticks of RAM kit, 4 sticks is much harder to run at the advertised speed (6000) which is actually an overclock.

Im a dev and a linux user/gamer hence my choice of CPU/GPU.

by sqquima4 hours ago|

[-]

Very similar config, but I bought a second pair of ram. Running 4 sticks at 3600. Also, the LAN port of the motherboard stopped working after a week, so I had to buy an Ethernet card

by alias_neo2 minutes ago|

[-]

Ouch, were you not willing to RMA for that ethernet port? I wouldn't be too pleased after only a week if parts of the board stopped working.

I don't really want to run my RAM that slow which is why I'll probably stick with two sticks.

by Aurornis6 hours ago|

[-]

Yes of course. We all know prices are up.

I commented because someone thought that $4K was the going price for 128GB of RAM, which is way too much even with the demand crunch.

by adrian_b4 hours ago|

[-]

Due to the high prices of DRAM and SSDs they now are the greatest fractions of the total price of a computer.

In January I was forced to upgrade an ancient Intel NUC, by replacing it with an Arrow Lake H based ASUS NUC. The complete system with 32 GB DRAM and 3 TB SSDs has cost EUR 1200, including VAT sales tax.

The distribution of the price was like this:

  Barebone mini-PC:   41%
  32 GB DDR5 SODIMMs: 26%
  2 TB PCIe 5.0 SSD:  24%
  1 TB PCIe 4.0 SSD:   9%

Since then, the prices of DDR5 and SSDs have continued to increase, so now the fraction spent for memory would be even higher than 59%.

Before 2026, for so small amounts of memory its cost would have been much less than the rest of the system.

by sspiff13 hours ago|

[-]

I bought 192GB (4x 48GB) of DDR5-6400 for 299 euro in September but returned it because I couldn't get 4 DIMMS to run at decent speeds in the system.

6 or so weeks after I returned it the kit was listed at 1499.

by 2001zhaozhao12 hours ago|

[-]

Yeah the only way to run 4 sticks of DDR5 decently is with Intel. It's a bit of a shame that you can't cram enough RAM to run big models.

The most I could get running on 10GB VRAM + 96GB RAM was a REAP'd + quantized version of MiniMax-M2.5

by mort966 hours ago|

[-]

Got it running with 4800MT/s and literally 30 minute boot times in an AM5 machine. The 30 minute boot time could be worked around by enabling the (off-by-default) memory context restore option in BIOS, but it really made me think something was broken and it wasn't until I found other people talking about 30 minute boot times that I stopped debugging and just let it sit for an eternity.

It's so bad. I don't get why they sell AM5 motherboards with 4 RAM slots.

At least that system has been running well for like two years. But had I known that the situation is so much more dire than with DDR4, I would've just gotten the same amount of RAM in two sticks rather than four.

by noir_lord2 hours ago|

[-]

You need to enable MCR (which trains the memory once and caches the result for (iirc) 30 days) otherwise yeah, booting is horribly slow, even the 64GB I have can take several minutes but with MCR it boots basically instantly.

Some motherboards have it off by default.

by kenhwang2 hours ago|

[-]

Memory training seems to be getting faster with each bios update. In 2024 when I upgraded to AM5, 64GB memory training took like 15 minutes. Now the same setup takes about a minute when it needs to retrain, then near instant with MCR (Windows 11 takes significantly longer to load than the POST process).

by WD-424 hours ago|

[-]

I’m in the same situation! My machine will take 2-5 minute to post every few reboots, it seems random. The messed up part is the marketing material says this things can handle 256gb of ram or whatever absurd number, f me for thinking then 128gb should be no problem. Honestly this whole thing has soured me on AMD. Yea they have bigger numbers than intel but at what cost, stability?

by noir_lord2 hours ago|

[-]

Check you have MCR (Memory Context Restore) enabled, otherwise you train the RAM way more often than you need to (every boot).

by secondcoming4 hours ago|

[-]

Your machine takes 30 minutes to boot because of the RAM? Or it takes 30 minutes to load a model?

by WD-424 hours ago|

[-]

It's the RAM. It needs to "trained" which takes some time but for for some reason these boards seem to randomly forget their training, requiring it to happen again.

by magicalhippo5 minutes ago|

[-]

I've never had memory training be forgotten with my AM4 nor LPDDR5-based laptops and NUCs. Is this a new thing with AM5 or something? Or just a certain brand of BIOSes?

by jazzyjackson49 minutes ago|

[-]

huh, its been a decade since i built a PC, whats changed?

by WD-426 hours ago|

[-]

I’m running 128gb on a 9550x now with 4x32gb sticks and it’s terrible. It’s unstsable, post time is about 2 minutes (not exaggerating)and I’m stuck at a lower speed. I’m considering just taking 2 of the sticks out and working with 64gb and increasing my swap partition. The nvme drive is fast at least.

This is my first time off intel and I have to say I don’t understand the hype.

by magicalhippo6 hours ago|

[-]

> It’s unstsable, post time is about 2 minutes (not exaggerating)

The long POST times must mean it's retraining the memory each time, which is not normal. Just in case you haven'ttried it yet, I'd start by reseating them, I've had weird issues with marginally seated RAM before.

Also you definitely have to go much slower with 4 sticks compared to two, so lower speed as much as you can. If that doesn't help, I'd verify them in pairs.

If they work in pairs but not in quad at the slowest speed, something is surely wrong.

Once you get them working in quad, you can start bumping up the speed, might need voltage boost as well.

by HauntingPin8 hours ago|

[-]

I had the same issue with Intel. It's not guaranteed there either.

by jodleif12 hours ago|

[-]

Threadripper is a good alternative. No point having a lot of dual channel ram for LLMs, too slow

[-]

No such bundle deals where I am. Absolute cheapest DDR5 128GB kit around is 2 sticks of 5600 64GB for $2k.

Cheapest 64GB kit is $930.

The kit I was oh-so-close to buying was two 6400 64GB sticks.

Not gonna buy now, not that desperate. I have a spare AM4 board, DDR4 memory and heck even CPU, I'll ride this one out. Likely skip AM5 entirely if something doesn't drastically change.

by Aurornis14 hours ago|

[-]

> Absolute cheapest DDR5 128GB kit around is 2 sticks of 5600 64GB for $2k.

That's not far from the bundle deal above, once you subtract the $700 CPU.

If you really need 128GB the 5600 kit is fine. Having 208MB of total cache on the CPU means the real world difference between a 5600 kit and a slightly faster kit is negligible in most use cases.

If you don't need to upgrade then clearly don't force an upgrade right now. I just wanted to comment that $4K for 128GB of RAM is a very bad price right now, even with the current situation.

by throwup2388 hours ago|

[-]

> a slightly faster kit is negligible in most use cases

Does that “most use cases” caveat really apply to someone buying 128G of RAM? If I’m buying that much, it means I’m actually going to put it through its paces, unless it’s just there for huge reserved guest VM overhead.

by Aurornis6 hours ago|

[-]

The 208MB of total cache on the CPU we’re discussing does a good job of reducing sensitivity to RAM speed differences on this platform.

If you’re trying to run LLMs off of the CPU instead of the GPU then the RAM speed dictates a lot. It’s going to be slow mo matter what, though. Dual channel DDR5 just isn’t enough to run large LLMs that start to fill 128GB of RAM and the difference between 5600 and 6400 isn’t going to make it usable.

If you’re just running a lot of VMs or doing a lot of mixed tasks that keep a lot of RAM occupied then you’d probably have a hard time measuring a difference between 5600 and 6400 if you tried with one of these X3D CPUs with a lot of cache.

This is a frequent topic of discussion for gamers because some people obsess over optimizing their RAM speed and timings and pay large premiums for RAM with CAS latency of 28 instead of 36. Then they see benchmarks showing 1-2% differences in games or even most productivity apps and realize they would have been better spending that extra money on the next faster GPU or CPU or other part.

[-]

> I just wanted to comment that $4K for 128GB of RAM is a very bad price right now

Oh absolutely. Just mentioned it since I was very close to buying it back then, and now it's completely bonkers.

That bundle deal is quite well priced all things considered, it basically prices the memory where it was. Again, sadly no great bundle deals here.

by nicman236 hours ago|

[-]

that bs of you don't need 128 are toxic. what if you want to upgrade from ddr4 and you already have 128?

by jofzar15 hours ago|

[-]

I really want a x3d because a game I play is heavily single threaded, I have the income and the financial stability but I can't in any good conscious upgrade to am5 with the ram prices. It's insane

[-]

Yep exactly the same situation.

I would not be surprised if we see casualties in adjacent markets, such as motherboards, coolers and whatnot.

by fakwandi_priv14 hours ago|

[-]

AMD had an upgrade path with the 5700x3d, assuming you’re on AM4.

Just reading now that they went out of production half a year ago which is a shame. I was very impressed being able to upgrade with the same motherboard 6 years down the line.

by timschmidt14 hours ago|

[-]

I'm the mythical customer who went from a 1700X in a B350 motherboard near launch day to a 5800X3D in the same board (after a dozen BIOS updates). Felt amazing. Like the old 486DX2 days.

by slightlygrilled11 hours ago|

[-]

Same! Kept checking back for bios updates and even years later they kept announcing more support! Truly crazy.

Other than the speed it’s a very good reason to go with amd, the upgrade scope is massive, on am5 you can go from a 6 core and soon all the way to a 24 core with the new zen6

by tyjen7 hours ago|

[-]

I was waiting too, but the one game I play often that requires FPS performance decided to ruin their game with poor development direction. Now, I'm planning to buy for local llm hosting.

Here's hoping to more developments like TurboQuant to improve LLM memory efficiency.

by Panzer0413 hours ago|

[-]

What game, if you don't mind my asking?

by jofzar11 hours ago|

[-]

World of Warcraft

by 10 hours ago|

[-]

deleted

by tarangsutariya10 hours ago|

[-]

Wonder how much sales amd and intel are losing because of tight DDR5 supply

by magicalhippo10 hours ago|

[1]: https://stocktwits.com/news-articles/markets/equity/amd-ceo-...

[-]

I can't imagine it's looking good in the consumer space, but server space seems to be lit[1]:

Su said that typically, the first quarter (Q1) is slower due to seasonal patterns, but AMD has seen its data center business expand from Q4 into Q1, demonstrating ongoing strength across both CPUs and GPUs. This growth underscores the company’s ability to capitalize on rising demand for AI compute and enterprise workloads, even during traditionally quieter periods.

“We are going into a big inflection year here in 2026. The CPU business is absolutely on fire.”

by aetimmes8 hours ago|

[-]

None. Every component is seeing huge demand.

by throawayonthe11 hours ago|

[-]

oh wow you weren't joking: https://pcpartpicker.com/products/memory/#xcx=0&b=ddr5&Z=131...

(cheapest at $1240 USD)

by MrDOS2 hours ago|

[-]

PCPartPicker are also publishing charts showing the astronomic rise in DDR5 prices over time: https://pcpartpicker.com/trends/price/memory/. Those charts don't cover any kits with 64 GB sticks, but they're a good demonstration of the general scale.

by tom_alexander11 hours ago|

[-]

> Probably fun for those who already bought DDR5 memory

Nah, those of us who already bought DDR5 memory also already bought decent CPUs. Dropping another $1k for these incremental gains would be silly. It'd make a lot more sense if DDR5 had been around longer so that people had the option to make generational upgrades to this CPU but DDR5 on AMD has only been around for Zen4 and Zen5.

by 7 hours ago|

[-]

deleted

by snvzz14 hours ago|

[-]

I am glad I decisively ordered 96GB (2x48) DDR5 ECC back in June, alongside the 9800x3d.

I hope this is still enough for the planned upgrade to Zen7 in 2028.

by mroche11 hours ago|

[-]

I'm looking at building a new system, and was waiting to see what happens with this chip and Intel's Arc Pro B70 card. I can't find ECC UDIMMs of 64GB per-stick to make 128GB, but I can put together two solo UDIMMs of 32GB or 48GB for $800 and $1000 per stick respectively.

I really want to see what enabling the L3 cache options in the BIOS do from a NUMA standpoint. I have some projects I want to work on where being able to even just simulate NUMA subdivisions would be highly useful.

by snvzz6 hours ago|

[-]

I was surprised to find that ECC modules available were 24 or 48, so 128GB with 2 sticks was impossible.

While I was aiming at 128, I settled for 96GB, because any more than 2 sticks means a sharp drop in RAM clocks this generation.

by Panzer0413 hours ago|

[-]

You're basically me. I was mulling 48 vs 96, decided 200$ wasn't worth quibbling too much over and bought 96GB in August.

Feeling pretty chuffed now XD (though still sad because building a new PC is dumb when RAM costs more than a 24 core monster CPU)

by snvzz6 hours ago|

[-]

This is the good side.

The not so good side is that getting a RVA23 development board this year with an usable size of RAM (for e.g. compiling and linking large code bases) is not going to be cheap.

by disillusioned13 hours ago|

[-]

Same... got 2x48 DDR5 for $304 back in February of 2025. Equivalent kits are going for $900-$1,100. Madness.

by DeathArrow13 hours ago|

[-]

>Meanwhile I hope my AM4 will chug along a few more years.

I am fine with my 2 year old 128GB DDR4 for now. I will just upgrade the 14700K to 14900KS CPU and wait 2 more years.

Judging by the benchmarks newer CPUs aren't much better for multithreading workloads than 14900KS anyway, so it doesn't make a lot of sense to upgrade to newer CPUs, DDR5 and a new mobo.

by jmyeet14 hours ago|

[-]

After randomly breaking the AM4 CPU and motherboard in my 4 year old PC last year and seeing that at the time I'd spent almost a new PC to get new parts and rebuild it. Less if I wanted to do a complete rebuild myself but I'm over building PCs. I've done that for years.

It was an expensive mistake as I bought a few options to experiment including a NUC and an M4 Mac Mini but eventually bought a 9800X3D 5070Ti PC for <$2 and for no reason in particular I bought a 64GB DDR5-6000 kit for $200 in August or so. I checked recently and that kit is pushing $1000. I also bought a 4080 laptop and bought a 64GB kit and an extra SSD for it too last year.

That's pretty lucky given what's happened since. I don't claim any kind of foresight about what would happen.

I do kind of want to take the parts I have and build another AM4 PC. The 5900XT is not a bad option with 16 cores for ~$300 but my DDR4 RAM is almost useless because the best deals now are for combos of CPU + motherboard + RAM at steep discounts.

You can get some good deals on prebuilts still. Not as good as 6+ months ago but still not bad. Costco has a 5080 PC for $2300. There's no way I'm going overboard and building a 128GB+ PC right now.

I've seen multiple RAM spikes. We had one at the height of the crypto hysteria IIRC but this is significantly worse and is also impacting SSDs. I kinda wish I'd bought 1-2 4TB+ SSDs last year but oh well.

We're really waiting for the AI bubble to pop. Part of me think sthat'll be in the next year but it could stay irrational substantially longer than that.

by sundvor10 hours ago|

[-]

The C30 64GB kits are nearly impossible to buy now, so, well done. Got one in September '23 for ~$380 AUD, on the rare occasions it's available today it's been over $1600 AUD.

I upgraded my UPS to a sine interactive unit to minimise the risk of it dying to bad power while the market is so crazy...

by chao-17 hours ago|

[-]

Crazy to think that my first personal computer's entire storage (was 160MB IIRC?) could fit into the L3 of a single consumer CPU!

It's probably not possible architecturally, but it would be amusing to see an entire early 90's OS running entirely in the CPU's cache.

by cwzwarich17 hours ago|

https://github.com/coreboot/coreboot/blob/main/src/soc/intel...

[-]

by wmf16 hours ago|

[-]

Context: Early in the firmware boot process the memory controller isn't configured yet so the firmware uses the cache as RAM. In this mode cache lines are never evicted since there's no memory to evict them to.

by coppsilgold13 hours ago|

[-]

There may be server workloads for which the L3 cache is sufficient, would be interesting if it made sense to create boards for just the CPU and no memory at scale.

I imagine for such a workload you can always solder a small memory chip to avoid having to waste L3 on unused memory and a non-standard booting process so probably not.

by stingraycharles11 hours ago|

[-]

Most definitely, I work in finance and optimizing workloads to fit entirely in cache (and not use any memory allocations after initialization) is the de-facto standard of writing high perf / low latency code.

Lots of optimizations happening to make a trading model as small as possible.

by lathiat14 hours ago|

[-]

I remember the talk about the Wii/WiiU hacking they intentionally kept the early boot code in cache so that the memory couldn’t be sniffed or modified on the ram bus which was external to the CPU and thus glitchable.

by pwg16 hours ago|

[-]

In my case it began with 16K (yes, 161024 bytes) and 90K (yes, 901024 bytes) 5.25" floppy disks (although the floppies were a few months after the computer). Eventually upgraded to 48K RAM and 180K double density floppy disks. The computer: Atari 800.

by MegaDeKay16 hours ago|

[-]

I'll see your Atari 800 and raise you my Atari 2600 with its whopping 128 bytes of RAM. Bytes with a B. I can kinda sorta call it a computer because you could buy a BASIC cartridge for it (I didn't and stand by that decision - it was pretty bad).

by acomjean9 hours ago|

https://en.wikipedia.org/wiki/Timex_Sinclair_1000

[-]

I thought the timex Sinclair 1000 win 2 Kbytes of ram was bad.

The membrane keyboard wasn’t great (the lack of a space bar was a wierd choice) but it did work. We had programs on casette and did get the 16Kbyte memory expansion.

I didn’t realize the Atari 2600 had basic, always thought of it as a game console.

by makapuf8 hours ago|

https://ww1.microchip.com/downloads/en/DeviceDoc/1006S.pdf

[-]

You can buy this bad boy [attiny11] with no ram, only registers.

by HerbManic15 hours ago|

[-]

My first PC had a 20MB HDD with 512Kb of RAM. So yeah that could fit into cache 10 times now.

by compounding_it16 hours ago|

[-]

Maybe in 50 years the cache of CPUs and GPUs will be 1TB. Enough to run multiple LLMs (a model entirely run for each task). Having robots like in the movies would need LLMs much much faster than what we see today.

by nextaccountic2 hours ago|

[-]

doubtful that we will still have this computer architecture by then

by basilikum16 hours ago|

[-]

KolibriOS would fit in there, even with the data in memory. You cannot load it into the cache directly, but when the cache capacity is larger than all the data you read there should be no cache eviction and the OS and all data should end up in the cache more or less entirely. In other words it should be really, really fast, which KolibriOS already is to begin with.

by vlovich12316 hours ago|

[-]

Unless you lay everything out continuously in memory, you’ll still get cache eviction due to associativty and depending on the eviction strategy of the CPU. But certainly DOS or even early Windows 95 could conceivably just run out of the cache

by tadfisher15 hours ago|

[-]

Windows 95 only needed 4MB RAM and 50 MB disk, so that's certainly doable. The trick is to have a hypervisor spread that allocation across cache lines.

by chao-16 hours ago|

[-]

Yeah, cache eviction is the reason I was assuming it is "probably not possible architecturally", but I also figured there could be features beyond my knowledge that might make it possible.

Edit: Also this 192MB of L3 is spread across two Zen CCDs, so it's not as simple as "throw it all in L3" either, because any given core would only have access to half of that.

by basilikum16 hours ago|

[-]

Well, yeah, reality strikes again. All you need is an exploit in the microcode to gain access to AMD's equivalent to the ME and now you can just map the cache as memory directly. Maybe. Can microcode do this or is there still hardware that cannot be overcome by the black magic of CPU microcode?

by hrmtst9383710 hours ago|

[-]

That assumes KolibriOS or any major component is pinned to one core and one cache slice instead of getting dragged between CCDs or losing memory affinity. Throw actual users, IO, and interrupts at it and you get traffic across chiplets, or at least across L3 groups, so the nice 'everything lives in cache' story falls apart fast.

Nice demo, bad model. The funny part is that an entire OS can fit in cache now, the hard part is making the rest of the system act like that matters.

by shric15 hours ago|

[-]

You had ~160,000 times more storage than I did for my first personal computer.

by defrost11 hours ago|

* https://en.wikipedia.org/wiki/Commodore_PET

[-]

Commodore PET for me - 8 KB of RAM and all the data you could store and read back from a TDK 120 cassette tape . . .

Same time as the Trash-80 and BBC micro were making inroads.

by bombcar17 hours ago|

[-]

IIRC some relatively strange CPUs could run with unbacked cache.

by twbarr17 hours ago|

[-]

Intel's platform, at the very least, use cache-as-ram during the boot phase before the DDR interface can be trained and started up. https://github.com/coreboot/coreboot/blob/main/src/soc/intel...

by alfiedotwtf12 hours ago|

[-]

> it would be amusing to see an entire early 90's OS running entirely in the CPU's cache.

There’s actually already two running (MINIX and UEFI), and it’s the opposite OS amusing - https://www.zdnet.com/article/minix-intels-hidden-in-chip-op...

by tumdum_10 hours ago|

[-]

My first pc had 40MB hrs and 8MB ram :D

by m46316 hours ago|

[-]

I wonder how much faster dos would boot, especially with floppy seek times...

by userbinator16 hours ago|

[-]

Instantly.

If you run a VM on a CPU like this, using a baremetal hypervisor, you can get very close to "everything in cache".

by RulerOf14 hours ago|

[-]

You can get close with a VM, but there's overhead in device emulation that slows things down.

Consider a VM where that kind of stuff has been removed, like the firecracker hypervisor used for AWS Lambda. You're talking milliseconds.

by amelius11 hours ago|

[-]

640K ought to be enough for anybody.

by Zardoz8411 hours ago|

[-]

My first computer whole RAM could fit in L1 of a single core (128k)

by senfiaj7 hours ago|

[-]

Back in 2004 my PC RAM was 256. My relative's laptop had 128. That's crazy when a modern CPU cache can theoretically host an OS (or even multiple OSes) from early 2000s.

by addaon2 hours ago|

[-]

The Power4 MCM had 128 MB cache in 2001. The G4 TiBook sold the same year came with 128 MB of system RAM base, and OS X supported 64 MB configurations for a few years after this.

by egeozcan6 hours ago|

[-]

The RAM prices are so high and the storage is also getting more expensive every day, so we're forced to fit everything inside the CPU cache as a solution! /s

by sqquima4 hours ago|

[-]

It would be interesting if it allowed to use the cache as ram and could boot without any sticks on the motherboard.

by addaon2 hours ago|

[-]

Several processors support this by effectively locking cache lines. At the low end, it allows a handful of fast interrupt routines without dedicated TCM. At the high end, it allows boot ROMs to negotiate DRAM links in software, avoiding both the catch 22 and complex hardware negotiation.

by 0-_-03 hours ago|

[-]

Instead of a cache you could put down an SRAM buffer, it would be more efficient than a cache and just as fast. And addressable. Interesting idea.

by monster_truck15 hours ago|

[-]

The extra cache doesn't do a damn thing (maybe +2%)

The lower leakage currents at lower voltages allowed them to implement a far more aggressive clock curve from the factory. That's where the higher allcore clock comes from (+30W TDP)

I'm not complaining at all, I think this is an excellent way to leverage binning to sell leftover cache.

Though if I may complain, Ars used to actually write about such things in their articles instead of speculate in a way that suspiciously resembles what an AI would write.

by Aurornis15 hours ago|

[-]

> The extra cache doesn't do a damn thing (maybe +2%)

It depends on the task. For some memory-bound tasks the extra cache is very helpful. For CFD and other simulation workloads the benefits are huge.

For other tasks it doesn't help at all.

If someone wants a simple gaming CPU or general purpose CPU they don't need to spend the money for this. They don't need the 16-core CPU at all. The 9850X3D is a better buy for most users who aren't frequently doing a lot of highly parallel work

by addaon2 hours ago|

[-]

CFD benefits from cache, but it benefits even more from sustained memory bandwidth, no? A small(ish) chunk of L3 + two channels of DRAM is not going to compete with a quarter as much L3 plus eight channels of DRAM when typical working set sizes (in my experience) are in the tens of gigabytes, is it?

by zahlman6 hours ago|

[-]

Sorry, what is "CFD" in this context?

by detaro6 hours ago|

https://en.wikipedia.org/wiki/Computational_fluid_dynamics

[-]

by YoumuChan12 hours ago|

[-]

But consumer product does not support SDCI (only Epyc Turin supports it), so it does not benefit too much if an accelerator is involved.

by monster_truck10 hours ago|

[-]

It's also useful to point out that the use cases and workloads where SDCI are most beneficial are far, far beyond the scope of what anyone will have installed in a Zen rig. Dual 100G networking cards? The cost of both of those damn near buys all of a 9950X3D2 setup.

by justincormack8 hours ago|

[-]

no, dual 100Gb are not that expensive any more, eg https://www.scan.co.uk/products/2-port-intel-e810-cqda2blk-d... UK retail for gbp349.

by monster_truck10 hours ago|

[-]

It really doesn't. In virtually every case the work is being completed faster than the cache can grow to that size. What little gains are being realized are from not having to wait for cores with access to the cache to become available.

by Aurornis6 hours ago|

[-]

> It really doesn't. In virtually every case the work is being completed faster than the cache can grow to that size.

If your tasks don’t benefit then don’t buy it.

But stop claiming that it doesn’t help anywhere because that’s simply wrong. I do some FEA work occasionally and the extra cache is a HUGE help.

There are also a lot of non-LLM AI workloads that have models in the size range than fit into this cache.

by Numerlor7 hours ago|

[-]

There are some very specific workloads (say simple object detection) that fit into cache and have crazy performance where the value of the cpu will be unbeatable, as the alternative is one of the cache epycs, everywhere else it'll only be small improvement if the software is not purpose made for it

by EnPissant15 hours ago|

See https://www.phoronix.com/review/amd-ryzen-9-9950x3d-linux/10

[-]

It's very workload dependent. It certainly does more than 2% on many workloads.

> Here is the side-by-side of the Ryzen 9 9950X vs. 9950X3D for showing the areas where 3D V-Cache really is helpful:

Coincidentally, it looks they filtered to all benchmarks with differences greater than 2%. The biggest speedup is 58.1%, and that's just 3d vcache on half the chip.

by spockz13 hours ago|

[-]

I think GP was saying that the additional 3D cache on this chip compared to the standard x3d isn’t going to do much.

I’m curious to see whether the same benchmarks benefit again so greatly.

by adrian_b11 hours ago|

[-]

On AMD the L3 cache is partitioned between the 2 chiplets.

So for 9950X3D half of the cores use a small L3 cache.

For applications that use all 16 cores, the cases where X3D2 provides a great benefit will be much more frequent than for a hypothetical CPU where the same cache increase would have been applied to a unified L3 cache.

The threads that happen to be scheduled on the 2nd chiplet will have a 3 times bigger L3 cache, which can enhance their performance a lot and many applications may have synchronization points where they wait for the slowest thread to finish a task, so the speed of the slowest thread may have a lot of influence on the performance.

by bell-cot12 hours ago|

[-]

> I think GP was saying...

Agree. The article's 2nd para notes "AMD relies on its driver software to make sure that software that benefits from the extra cache is run on the V-Cache-enabled CPU cores, which usually works well but is occasionally error-prone." - in regard to the older, mixed-cache-size chips.

> I'm curious to see...

Yeah - though I don't expect current-day Ars Technica will bother digging that deep. It could take some very specialized benchmarks to show such large gains.

by monster_truck10 hours ago|

[-]

Some of their writers, who are quite excellent, still do. Others just seem to regurgitate press releases with very little useful investigation.

How critical of the lazy writers I am may seem outsized, but I grew up reading and learning from the much better version of Ars -one I used to subscribe to.

by spockz9 hours ago|

[-]

I hoping that phoronix will be able to redo the benchmark of the 9950x3D with this new X3D2 variant.

I might even shell out for an upgrade to AM5 and DDR5. On the other hand, my 5900X is still blazing fast.

by SubiculumCode2 hours ago|

[-]

Oh man. I am running computations on my server that involve computing geodesic distances with the heat method. The job turns out to be a L3 cache thrasher, leaving my cpus underutilized for multi worker jobs .... 208mb instead of my 25 per socket sounds amazing

by AnthonyMouse1 hours ago|

[-]

They sell essentially the same chips with more CCDs as Epyc instead of Ryzen. 9684X has more than 1GB of L3 per socket (but it's not cheap).

by nexle16 hours ago|

[-]

Breakdown of the (semi-clickbait) 208MB cache: 16MB L2 (8MB per die?) + 32MB L3 * 2 dies + 64MB L3 Stacked 3D V-cache * 2

For comparison, 9950X3D have a total cache of 144MB.

by trynumber916 hours ago|

[-]

> 16MB L2 (8MB per die?)

It is indeed 8MB per compute die but really 1MB per core. Not shared among the entire CCD.

by teaearlgraycold15 hours ago|

[-]

I wouldn’t be caught dead with less than 200MB of cache in my desktop in 2026.

by rietta3 hours ago|

[-]

I am so grateful that I bought my 128 GB ram kit in January of last year for my own 9950 upgrade. We just built my dad a 7000 series to replace his old AM4 (2017 build) and 32 gigs DDR five was nearly the same price at Micro Center that I paid last year. I was able to gift him an Nvidia 1060 discreet graphics card so that he could continue to run his two monitors. The newer motherboards have much less on board capability for that.

by hu33 hours ago|

[-]

1060 is a sweet card for multi monitor. good on you for gifting him.

by rietta3 hours ago|

[-]

I upgraded to a 4070 super last year. I ran both cards at the same time for a little bit, but it got really frustrating to keep the wrong card from being assigned to a particular task with llama. I really should’ve taken an R&D tax credit on my AI research but I’m still able to expense it for the business.

by pwr229 hours ago|

[-]

I'm interested to know if the L3 cache all behaves as a single pool for any core on either CCD, whether there's a penalty in access time depending on locality or whether they are just entirely localised.

by phire8 hours ago|

[-]

The short answer is that L3 is local to each CCD.

And that answer is good enough for most workloads. You should stop reading now.

_______________________

The complex answer is that there is some ability one CCD to pull cachelines from the other CCD. But I've never been able to find a solid answer for the limitations on this. I know it can pull a dirty cache line from the L1/L2 of another CCDs (this is the core-to-core latency test you often see in benchmarks, and there is an obvious cross-die latency hit).

But I'm not sure it can pull a clean cacheline from another CCD at all, or if those just get redirected to main memory (as the latency to main memory isn't that much higher than between CCDs). And even if it can pull a clean cacheline, I'm not sure it can pull them from another CCD's L3 (which is an eviction cache, so only holds clean cachelines).

The only way for a cacheline to get into a CCD's L3 is to be evicted from an L2 on that core, so if a dataset is active across both CCDs, it will end up duplicated across both L3s. Cachelines evicted from one L3 do NOT end up in another L3, so an idle CCD can't act as a pseudo L4.

I haven't seen anyone make a benchmark which would show the effect, if it exists.

by undersuit8 hours ago|

https://www.phoronix.com/review/amd-3d-vcache-optimizer-9950...

[-]

AMD didn't have to introduce a special driver for the Ryzen 9 5950x to keep threads resident to the "gaming" CCD. There was only a small difference between the 5950x and the non-X3d Ryzen 7 5800x in workloads that didn't use more than 8 cores unlike the observed slowdowns in the Ryzen 9s 7950X3D and 7900X3D when they were released compared to the Ryzen 7 7800X3D .

When the L3 sizes are different across CCDs the special AMD driver is needed to keep threads pinned to the larger L3 CCD and prevent them from being placed on the small L3 CCD where their memory requests can exploit the other CCD's L3 as an L4. The AMD driver reduces CCD to CCD data requests by keeping programs contained in one CCD.

With equal L3 caches when a process spills onto the second CCD it will still use the first's L3 cache as "L4" but it no longer has to evict that data at the same rate as the lopsided models. Additionally the first CCD can use the second CCD's L3 in kind reducing the number of requests that need to go to main memory.

The same sized L3s reduce contention to the IO die and the larger sized L3s reduce memory contention, it's a win-win.

by MaximilianEmel1 hours ago|

[-]

They should allow it to function without any external RAM.

by electronsoup2 hours ago|

[-]

Whenever I see a chip like this, I think "why wont my company let me use a decent computer"

by 2001zhaozhao12 hours ago|

[-]

I don't really see a huge reason to buy this other than it being a top-tier halo product.

For gaming, AMD already pins the game threads to the CCD with the extra cache pretty well.

For multi-threaded workloads the gain from having cache on both CCDs is quite small.

by adrian_b10 hours ago|

[-]

The gain is very workload dependent, so there are no generally-applicable rules.

There are many applications which need synchronization between threads, so the speed of the slowest thread has a disproportionate influence on the performance.

In such applications, on X3D2 the slowest thread has a 3 times bigger cache on an X3D2 vs. X3D. That can make a lot of difference.

So there will be applications with no difference in performance, but also applications with a very large difference in performance, equal to the best performance differences shown by X3D vs. plain 9950X.

by pixl974 hours ago|

[-]

It really comes down to how much more this CPU is over the next one down if you're building a new rid for a long period of time. I'm running on a 5950X which is coming up on it's 6 years in November. I could have spend a little less on the next model down, but I expect this rig will last me for a few more years (especially with how much memory is). The per year extra expense for that CPU was almost nothing over its lifetime.

Now, would I upgrade an existing computer with a slightly slower processor with it, probably not.

by erulabs15 hours ago|

[-]

9950X3D2? AMD, who is making you name your products like this? At some point just give up and name the chip a UUID already.

by jofzar15 hours ago|

[-]

I actually don't mind this one, 9950 is the actual chip, x3d is the cache (where it's larger) and the 2 stands for it being on both chiplets.

by sidkshatriya15 hours ago|

[-]

Like your UUID joke but agree with sibling comment that 9950X3D2 is actually a good name.

by hu314 hours ago|

[-]

can't agree. this name has logical meaning

by Readerium17 hours ago|

[-]

Can someone explain if the 3D Vcache are stacked on top of each other or side by side.

If they are stacked then why not 9800X3D2?

by zdw17 hours ago|

[-]

The 99xx chips have two CPU dies, and one cache die is on each CPU die.

by modeswitch16 hours ago|

[-]

The 3D V-Cache sits underneath only one of the CCDs. See https://en.wikipedia.org/wiki/Ryzen#Ryzen_9000.

by anonymars16 hours ago|

[-]

That's what's different about this one. "Enter the Ryzen 9 9950X3D2 Dual Edition, a mouthful of a chip that includes 64MB of 3D V-Cache on both processor dies, without the hybrid arrangement that has defined the other chips up until now."

by 16 hours ago|

[-]

deleted

by Tostino16 hours ago|

[-]

Did you forget which thread we are on?

by 16 hours ago|

[-]

deleted

by Jotalea3 hours ago|

[-]

so you're telling me I can (theoretically) have a full Alpine Linux installation in just the CPU? I'm impressed

by fc417fc80216 hours ago|

[-]

Given that the dies still have L3 on them does this count as L4 or does the hardware treat it as a single pool of L3?

Would be neat to have an additional cache layer of ~1 GB of HBM on the package but I guess there's no way that happens in the consumer space any time soon.

by trynumber915 hours ago|

[-]

Per compute die it functions as one 96M L3 with uniform latency. It is 4 cycles more latency than the configuration with smaller 32M L3. But there are two compute dies, each with their own L3. And like the 9950X coherency between these two L3 is maintained over global memory interconnect to the third (IO) die.

by varispeed3 hours ago|

[-]

I know the prices of RAM are high, but 256GB RAM limit seems like omission. If they supported at least 512GB in quad or eight channel that would be something worth looking at for me. I know there is Threadripper but ECC memory is out of reach.

by jaimex213 hours ago|

[-]

Can someone like... boot Windows 98 on these on a system with no ram?!

by brandnewideas9 hours ago|

[-]

Theoretically anything is possible with enough thought and work.

by bell-cot12 hours ago|

[-]

Conceptually - yes, easily.

But to do it literally - I'm not a low-level motherboard EE, but I'd bet you're looking at 5 to 7 figures (US $) of engineering work, to get around all the ways in which that would violate assumptions baked into the designs of the CPU, support chips, firmwares, etc.

by anticensor1 hours ago|

[-]

The CPU literally initialises itself without DDR then initialises the DDR PHY, there must be a way of keeping the CPU in that "cache as RAM" mode.

by ggm11 hours ago|

[-]

Make a fake ram which offers write through guarantee and returns bus no matter what address is referenced. You could possibly short circuit any "is ram there" test if it just says yes for whatever size and stride got configured.

by tw198414 hours ago|

[-]

that is larger than the HDD of my first PC.

by DeathArrow12 hours ago|

[-]

My first computer had 64KB of RAM. My first PC had 8MB of RAM.

by swarnie8 hours ago|

[-]

Factorio mega basing just found a new ceiling.

by Lightkey3 hours ago|

[-]

I'm curious to see if that is true. The maximum amount of cache addressable per core didn't increase after all.

by renewiltord16 hours ago|

[-]

I have a gigabyte of cache on my 9684x at home!

by sylware10 hours ago|

[-]

With the best silicon tech, in R&D, what would be the maxium static RAM(L1 cache) you could really slap to a 8 core CPU? (Zero DRAM).

by throwaway8582515 hours ago|

[-]

It's disappointing that they had this for years but didn't release it until now.

by stingraycharles15 hours ago|

[-]

I think it’s mostly that they had leftover cache.

by neRok10 hours ago|

[-]

This video made the argument that AMD released it to not give Intel a look-in: [AMD KILLED Intel's 290K Dreams w/ R9 9950X3D2](https://www.youtube.com/watch?v=u7SyrDPbKls)

by stingraycharles7 hours ago|

[-]

I like this theory more, perhaps it’s both.

[-]

Makes sense. RAM pricing surely has lead to a fall of AM5 high-end CPU purchases, might as well try to get some extra cash from those who still buy. Bin the remaining now non-X3D chips as something else.

by Ekaros3 hours ago|

[-]

Bad time to move entirely new platform. Perfect time to sell to upgrade junkies just CPU.

by jeremie_strand5 hours ago|

[-]

[dead]

by jeremie_strand7 hours ago|

[-]

[dead]

by qmr17 hours ago|