undefined

upvote

points

by doodlesdev1 days ago |

upvote

by jwr5 hours ago|

[-]

Well, I can tell you how my thinking goes: 1) I don't buy my computer just to run LLMs and there are many scenarios where I benefit from both a decent GPU and from a large amount of RAM, 2) I run a solo-founder business which owns exactly one computer in the entire company so it might as well be a good one, and 3) I don't need a new car, so comparing pricing this way is irrelevant.

In other words, yes, buying this kind of machine only to run an LLM locally doesn't make sense, because local LLMs generally still suck for serious programming work (they work great for spam filtering though!). But more generally this machine makes sense for a lot of people.

reply

upvote

by JeremyNT1 days ago|

[-]

I also don't understand why people in this price bracket are buying Mac laptops instead of desktop computers with GPUs? Just to flex that it's portable?

reply

upvote

by mft_22 hours ago|

[-]

(I'm not one of the people you're speaking of with a 128gb M5 but) if you want to run one of the medium-sized open-weights models (Qwen 27b, 35b, Gemma 4 26b, 31b) or larger, you get into an interesting optimisation space.

* yes, you can run it on an older/smaller GPU plus system RAM but performance will suffer

* if you want optimal GPU performance you need the model in VRAM plus context, so 24GB (3090, 4090) or 32GB (5090) cards, plus a system that's reasonable powerful to plug them in to. Ideally you'd have a multiple cards working together but for optimal performance this means either 2x 3090 or nvidia's workstation cards.

* you can go for a 128gb Strix Halo system, but the memory bandwidth isn't great and they're becoming increasingly more expensive (5.5k EUR for HP laptop, 3.9k EUR for GMKtec EVO-X2 mini PC)

* you can go for a 128gb DGX Spark (5k EUR+) which also has unspectacular memory bandwidth or RTX Spark (price unclear but probably not cheaper)

* or go for a Mac with a decent CPU and a good amount of RAM (bandwidth varies by model, but typically a bit better than Strix Halo/DGX Spark and worse than bespoke GPUs.

As usual with such questions, there are of course cheaper paths (if you want to accept the tradeoffs) but Macs are reasonable vs. competition for these workloads.

reply

upvote

by edb_1235 hours ago|

[-]

I just recently got into experimenting with local LLMs when I had anyway (for non-LLM reasons) built myself a new desktop system with Intel Ultra 270K-Plus and RTX 5080. With 64GB system RAM and 16GB VRAM. Relatively speaking a high-performing and low-to-moderate cost system.

I wasn't really expecting much from these local open weight models neither when it comes to speed or "intelligence", but my preconceptions were quickly put ashame when I got ollama up and running and pulled my first model. I get a consistent 117-128 t/s with Gemma4:26b-a4b without any tuning (just the default settings), which was much faster than I had expected. Can't wait to dive deeper into this, especially with Qwen3.6 models.

Does anyone's have experience adding a 2nd Nvidia GPU of the same generation but different (slower) model in the same system? Will it give a major boost with larger models, or will the slower card just be a bottleneck? I have an unused RTX 5060 Ti 16GB that I'm considering to install alongside the RTX 5080, but it would necessitate removing some other hardware, so I haven't bothered yet.

reply

upvote

by skolos3 hours ago|

[-]

I'd say adding another 16Gb gpu would be worth it - you'd be able to run larger model/larger context all within gpu's. It would give you more options of what you can run fast. Your current model probably doesn't run completely from GPU (depending on quants I don't think you can squeeze Gemma4:26b into 16Gb vram), so you already have some layers running on gpu and some on cpu. If you add another gpu you might be able to move all layers to vram which should speed up things for you. The layers calculations happen on whatever gpu's it sits, so the layers that are already on your rtx5080 would compute same, but the layers that currently your cpu handles will be computed with faster vram/compute of rtx5060.

reply

upvote

by pletnes13 hours ago|

[-]

And with a mac, there are no cuda drivers to fiddle with.

reply

upvote

by girvo13 hours ago|

[-]

But prompt processing is terrible

reply

upvote

by jeroenhd1 days ago|

[-]

A mac with a boatload of RAM can run models that will exceed the limits of any GPU not worth at least twice the Apple hardware itself.

You get fewer tokens per second, but at some point the balance between quality and quantity makes the large model size worth the spend.

When you're spending this kind of money, you may as well treat yourself to a pretty screen and some decent speakers. Nothing the competition doesn't offer these days, but you get them for free with the car-priced RAM upgrade so why go for less.

reply

upvote

by ctkhn23 hours ago|

[-]

I don't even travel a ton but portability is huge. It's not a flex, it's a functional thing that lets me move around within my house or work while I'm at my parents or traveling or anywhere else. Other than my media collection that lives on my home server, I want most of my files to come with me on my laptop.

reply

upvote

by FuckButtons18 hours ago|

[-]

The fact that I can take it with me? That I don’t need internet to still have access to deepseek? The fact that electricity is expensive and an mbp uses ~10% of the power that an equivalent vram set up would using gpu’s. Also, in order to get the same vram I would need to spend a similar amount, but wouldn’t also have a machine that was useful for other workloads that need a huge amount of ram.

reply

upvote

by indemnity18 hours ago|

[-]

Potentially going to sound privileged here, but why not both?

Personally when going on the road I like portability (14" MBP or MBA), but at home I want raw non-thermally throttled power.

reply

upvote

by LeBit1 days ago|

[-]

I think it is because desktop computers with GPUs with enough VRAM to run interesting models are insanely expensive, hard to source and consume a lot of electricity and dissipate a lot of heat.

reply

upvote

by ilogik1 days ago|

[-]

What GPU can I buy with >100GB of memory?

reply

upvote

by verdverm1 days ago|

[-]

DGX Spark is one, but really depends on how much you want to spend

reply

upvote

by aurareturn22 hours ago|

[-]

273GB/s bandwidth vs 614 GB/s of the M5 Max. And you're getting a whole laptop.

$5k for DGX Spark as well.

reply

upvote

by verdverm21 hours ago|

[-]

Prompt processing time is better on the spark, which aligns more with coding (more reading than writing).

I spent less than $4k, OEM are better boxes for cooling, no apple markup, I get a real Linux system for stuff like k3s.

reply

upvote

by aurareturn19 hours ago|

[-]

Yes, it's better on the Spark but the M5 is a lot closer than before with neural accelrators. After prompt processing, token generation speed on the M5 Max is 2.3x faster.

No Apple markup but you get the Nvidia market up instead. Prior to the recent Apple price increase due to RAM shortage, an M5 Max 128GB was a bargain if you want to run local LLMs.

reply

upvote

by verdverm40 minutes ago|

[-]

I can get 2.5 spark for the price of the M5, will have better throughput and access to bigger models (more vram when running tensor parallel)

reply

upvote

by bastardoperator23 hours ago|

[-]

I have a bunch of computers and gadgets, why settle on one?

reply

upvote

by satvikpendem18 hours ago|

[-]

Unified memory.

reply

upvote

by redox991 days ago|

[-]

Yeah, it's a much better idea to buy many used 3090s. 4090s or 5090s if you can afford it. Way faster.

reply

upvote

by aurareturn22 hours ago|

[-]

Probably depends on what you're trying to do.

You need an expensive motherboard, cooling, PSU(s) to use multiple high end GPUs together. Then there is the noise and the fact that you can't bring it on an airplane.

reply

upvote

by btbuildem22 hours ago|

[-]

I think it's silly to go for a laptop form factor. Last fall I put together a workstation with two second-hand 3090s in it (paid $850CDN each, now the best I can find is $1200). With 48GB VRAM it's reasonable - and I've been using Qwen 3.6 27B for various tasks around building KGs from text corpora / reasoning about them.

I've ran comparisons against everything that's available on OpenRouter (well, as of few weeks ago), and for $0/tok, the local 27B Qwen can't be beat. Sure, it's slower, and yeah, the office is a few degrees warmer than it ought to be -- but nobody can pull the plug, nobody is watching over my shoulder, and the results are on par with SOTA.

Can't wait for a similarly sized Qwen 3.7 - from what I've seen so far, it's a leap ahead of the previous version.

reply

upvote

by Gigachad20 hours ago|

[-]

I think it still makes sense to wait. Hardware is currently hyper expensive and cloud models are subsidized. Waiting 2 years or so once memory prices have dropped and datacenters start wanting a profit would get you a usable setup that's more economical.

reply

upvote

by whichquestion20 hours ago|

[-]

How much electricity does running your local models take?

reply

upvote

by alemanek16 hours ago|

[-]

If your workflow benefits from the speed it quickly pays for itself when factoring in developer salaries here in the US. I recently switched companies and they bought me an M5 Max 128GB as my dev machine.

Builds and local test runs are 3 times faster than the Windows laptop option. The machine will pay for itself just based on that within 3 months. I can spin up a local kubernetes cluster and do full integration tests while I am working on other things as well.

It isn’t a strictly Mac vs Windows thing though. It looks like the culprit is the MDM software on the Windows machines is just crazy slow and constantly getting in the way.

If I was paid less it would definitely make less sense for the company to pay for this machine.

reply

upvote

by v1ne12 hours ago|

[-]

Don't worry. Once IT Security discovers that they miss their trusty endpoint security products on your Mac, they'll add it and you'll be in the same ballpark as the Windows machine. Been there, received that, and learnt that Microsoft Defender exists for macOS, too.

reply

upvote

by __MatrixMan__5 hours ago|

[-]

That's why you gotta have your work laptop, and your "work" laptop. Give them something to secure. It makes them feel important.

reply

upvote

by ntcho6 hours ago|

[-]

Gosh, the last sentence is the most terrifying thing I’ve read today.

reply

upvote

by reilly300023 hours ago|

[-]

It’s an asset on my balance sheet that’s already appreciating nicely and will likely be resale-able for what I paid for it for the next 7-10 years. I am on an Apple monthly installment plan so $5k is $416/month for 1 year, no interest. I’m able to run DS4 scale models and other open models without quantization, often multiple at once.

Imagine its value if war broke out over Taiwan / Greater China, or really any of the dark scenarios with global connectivity or the truthiness of commercially available models. It is a very, very difficult piece of equipment to make at any other moment in history. I wish I could have purchased more. I saw the signs and price trends and out of stocks as they unfolded. No doubt others with the means are stockpiling.

reply

upvote

by simplyluke22 hours ago|

[-]

> will likely be resale-able for what I paid for it for the next 7-10 years

There is not a period in the history of computing where this is true of consumer hardware over a decade for anything other than hardware already at the very bottom of its depreciation curve. It is surprising to me that you state that as an obvious assumption.

I suppose if your base case is Taiwan war that may be true, but there's a lot of folks who seem to be assuming the current hardware crunch will go on indefinitely when the natural state of hardware is getting cheaper over time.

reply

upvote

by bellowsgulch1 days ago|

[-]

> Are developers in other countries living in such different worlds?

Yes. Your people earn an order of magnitude less income than Americans.

reply

upvote

by adamors1 days ago|

[-]

Yes they are, 6k is peanuts to a lot of people.

reply

upvote

by verdverm1 days ago|

[-]

It's not always about the price or being the cheapest. For me, it's about freedom, both to play and from the govt/corp censorship.

reply

upvote

by znpy1 days ago|

[-]

> Are developers in other countries living in such different worlds?

Yes. Back in the my days at $faang in europe it was not uncommon to hear people getting 120-160 k€/year in compensation and we were “poor” compared to us engineers at the same faang (4-500 k$/year total compensation) with a bit of seniority…

reply

upvote

by doodlesdev23 hours ago|

[-]

That makes a lot of sense! I have no idea how I'd use that much money, so maybe the 128gb MBP for messing around with local LLMs wouldn't sound so absurd :)

reply