undefined

points

[-]

Only for chat sessions, not for agentic coding. It's just too slow to be practical (10 minutes to answer a simple question about a 2k LoC project - and that's with a 5070 addon card).

by ac2915 hours ago|

parent|

[-]

This article is about a MoE model with only 4B active parameters, it shouldn't take 10 minutes to answer a question about a small project.

I measured a 4bit quant of this model at 1300t/s prefill and ~60t/s decode on Ryzen 395+.

by nl18 hours ago|

parent|

prev|

[-]

Doesn't the framework desktop have a Ryzen 395 AI? That's a unified memory architecture like the Macs.

by pshirshov8 hours ago|

parent|

[-]

Ah, forgot to add, it's not really "unified" you have to explicitly specify your allocations. You may have a reasonably good 48gb chunk assigned to the GPU, but that DDR5 is 5-10 times slower than GDDR/HBM and the GPU itself isn't stellar.

So, framework laptops are great for chatting but nearly useless in agentic coding.

My Radeon W7900 answers a question ("what is this project") in 2 minutes, it takes my Framework 16 with 5070 addon around 11 minutes without the addon - around 23 (qwen 3.5 27b, claude code)

by pshirshov10 hours ago|

parent|

prev|

[-]

That's discrete DDR5, it's not as fast as your regular VRAM.