- Your RTX 6000 is closer to $10k now
- Sparks are creeping into the $4-5k range
- AMD Strix are ~3.5k
- Apple depends on chipset and memory. Sweet spot would be 128gb M3 Ultra, probably $6-8k but admittedly haven't been tracking closely. New M5 might come in the fall. You can get a new 128gb M5 Max laptop for ~5-6k today.
- a 4x3090 rig would take $5-6k
Every platform has tradeoffs, but it's mostly ecosystem, memory bandwidth, and power consumption. They're all slow. The best option is likely to rent hardware on Runpod. The RIO on self-hosting is very low unless you have a specific need or you're ok treating it as a hobby.
>The best option is likely to rent hardware on Runpod.
Vast.ai is much cheaper, but the broader point here is contestable. The only dimension in which cloud GPU rentals win is cost. You lose the confidentiality, integrity, and availability benefits of local deployments.
Not that I'd encourage anyone to throw large amounts of money to have access to LLMs, but you're definately going to be better off buying something that you can amortize over multiple years with a multi year warranty.
This whole thing is really starting to remind me of the crypto hype phases of 2016-2018 when everyone thought their investment in GPUs was going to make them rich.
Yes, LLMs are sloppy, and local models usually more so (but things change fast).
But the local ones have one big advantage: they are private. So you can safely feed them the collection of your private documents and things you wouldn't trust people like sama with. The fact that some people do not care is one of the failures of our educational system.
In October/2024 I got my Mac studio M1 ultra with 128G, IIRC it was ~$2500. With recent prices explosion, it has certainly gotten more expensive. https://frame.work/ is selling 128G strix halo mainboard for $2700, but you have to add storage and case.
Non-NVIDIA backends tend to get less support and new features land slower, or features that are expected to improve performance wind up hurting it instead. That sort of thing.
For basic “token in/token out” workloads without fine tuning, it’s probably fine ??
Unfortunately, the prices rose on these a lot, but unevenly. Beelink GTR 9 Pro is $4400, Framework Desktop is ~$3500, for what is basically the exact same mainboard as a Bosgame M5 for $2800.
Apple's M5 Max is another attractive option. Apple silicon traditionally had great MBW and was good at TG, but struggled with PP, but the new neural engines in those GPU cores have made a big difference in a good way here.
Gorgon Halo is rumored for June announcement with Q4'26 release with basically +100 MHz clocks on Strix Halo, LPDDR5X-8533 instead of LPDDR5X-8000, but more importantly, 192 GB max instead of 128 GB.
I'd say it's better to wait for Gorgon Halo than to grab Strix Halo now. However, Medusa Halo, rumored for H2'27, is slated to have up to 26c Zen 6 (heterogeneous cores - kinds funny that AMD is heading towards these as Intel retreats from them), 48 CU of RDNA 5 instead of 40 CU RDNA 3.5, and a 384 bit bus w/ LPDDR6, which should make 256 GB at more like ~490-600 GB/s MBW, which will really make Strix and Gorgon Halo obsolete.
Also worth keeping an eye out for Serpent Lake (intel CPU + nvidia iGPU on a single board with unified memory, rumored for 2028-2029 iirc), and on the 160 GB Crescent Island Intel dGPU.