upvote
I've also been running Qwen 3.6 35B A3b on my Windows laptop (64 GB RAM, a 4GB GPU) and it's at least tolerable. It's not fast - a few tokens per second, slower than reading speed - but I can give it a task and come back later. That was a $600 laptop off eBay a few years ago, not a $6,000 machine.

Are these unified memory Macs and giant 24GB desktop GPUs achieving dozens or hundreds of tokens per second commensurate with their 10x-20x cost?

reply
35b A3b runs ~100 tokens a second on the best M5 Max gpu setup.
reply
I got around 50-60 on my m3 max so 100tps seems very realistic for 2 gens later of chip and double the ram
reply
What is the speed on responses? (t/s)

The full 128GB is surely helpful in keeping browsers, editors and other things running since even 20-35GB models + k/v caches can eat up a lot of the core 64GB in my experience.

reply