In other words, yes, buying this kind of machine only to run an LLM locally doesn't make sense, because local LLMs generally still suck for serious programming work (they work great for spam filtering though!). But more generally this machine makes sense for a lot of people.
* yes, you can run it on an older/smaller GPU plus system RAM but performance will suffer
* if you want optimal GPU performance you need the model in VRAM plus context, so 24GB (3090, 4090) or 32GB (5090) cards, plus a system that's reasonable powerful to plug them in to. Ideally you'd have a multiple cards working together but for optimal performance this means either 2x 3090 or nvidia's workstation cards.
* you can go for a 128gb Strix Halo system, but the memory bandwidth isn't great and they're becoming increasingly more expensive (5.5k EUR for HP laptop, 3.9k EUR for GMKtec EVO-X2 mini PC)
* you can go for a 128gb DGX Spark (5k EUR+) which also has unspectacular memory bandwidth or RTX Spark (price unclear but probably not cheaper)
* or go for a Mac with a decent CPU and a good amount of RAM (bandwidth varies by model, but typically a bit better than Strix Halo/DGX Spark and worse than bespoke GPUs.
As usual with such questions, there are of course cheaper paths (if you want to accept the tradeoffs) but Macs are reasonable vs. competition for these workloads.
I wasn't really expecting much from these local open weight models neither when it comes to speed or "intelligence", but my preconceptions were quickly put ashame when I got ollama up and running and pulled my first model. I get a consistent 117-128 t/s with Gemma4:26b-a4b without any tuning (just the default settings), which was much faster than I had expected. Can't wait to dive deeper into this, especially with Qwen3.6 models.
Does anyone's have experience adding a 2nd Nvidia GPU of the same generation but different (slower) model in the same system? Will it give a major boost with larger models, or will the slower card just be a bottleneck? I have an unused RTX 5060 Ti 16GB that I'm considering to install alongside the RTX 5080, but it would necessitate removing some other hardware, so I haven't bothered yet.
You get fewer tokens per second, but at some point the balance between quality and quantity makes the large model size worth the spend.
When you're spending this kind of money, you may as well treat yourself to a pretty screen and some decent speakers. Nothing the competition doesn't offer these days, but you get them for free with the car-priced RAM upgrade so why go for less.
Personally when going on the road I like portability (14" MBP or MBA), but at home I want raw non-thermally throttled power.
$5k for DGX Spark as well.
I spent less than $4k, OEM are better boxes for cooling, no apple markup, I get a real Linux system for stuff like k3s.
No Apple markup but you get the Nvidia market up instead. Prior to the recent Apple price increase due to RAM shortage, an M5 Max 128GB was a bargain if you want to run local LLMs.
You need an expensive motherboard, cooling, PSU(s) to use multiple high end GPUs together. Then there is the noise and the fact that you can't bring it on an airplane.
I've ran comparisons against everything that's available on OpenRouter (well, as of few weeks ago), and for $0/tok, the local 27B Qwen can't be beat. Sure, it's slower, and yeah, the office is a few degrees warmer than it ought to be -- but nobody can pull the plug, nobody is watching over my shoulder, and the results are on par with SOTA.
Can't wait for a similarly sized Qwen 3.7 - from what I've seen so far, it's a leap ahead of the previous version.
Builds and local test runs are 3 times faster than the Windows laptop option. The machine will pay for itself just based on that within 3 months. I can spin up a local kubernetes cluster and do full integration tests while I am working on other things as well.
It isn’t a strictly Mac vs Windows thing though. It looks like the culprit is the MDM software on the Windows machines is just crazy slow and constantly getting in the way.
If I was paid less it would definitely make less sense for the company to pay for this machine.
Imagine its value if war broke out over Taiwan / Greater China, or really any of the dark scenarios with global connectivity or the truthiness of commercially available models. It is a very, very difficult piece of equipment to make at any other moment in history. I wish I could have purchased more. I saw the signs and price trends and out of stocks as they unfolded. No doubt others with the means are stockpiling.
There is not a period in the history of computing where this is true of consumer hardware over a decade for anything other than hardware already at the very bottom of its depreciation curve. It is surprising to me that you state that as an obvious assumption.
I suppose if your base case is Taiwan war that may be true, but there's a lot of folks who seem to be assuming the current hardware crunch will go on indefinitely when the natural state of hardware is getting cheaper over time.
Yes. Your people earn an order of magnitude less income than Americans.
Yes. Back in the my days at $faang in europe it was not uncommon to hear people getting 120-160 k€/year in compensation and we were “poor” compared to us engineers at the same faang (4-500 k$/year total compensation) with a bit of seniority…