undefined

points

by gcr12 hours ago |

comments

by tempoponet11 hours ago|

[-]

Expect to pay $4k-10k

- Your RTX 6000 is closer to $10k now

- Sparks are creeping into the $4-5k range

- AMD Strix are ~3.5k

- Apple depends on chipset and memory. Sweet spot would be 128gb M3 Ultra, probably $6-8k but admittedly haven't been tracking closely. New M5 might come in the fall. You can get a new 128gb M5 Max laptop for ~5-6k today.

- a 4x3090 rig would take $5-6k

Every platform has tradeoffs, but it's mostly ecosystem, memory bandwidth, and power consumption. They're all slow. The best option is likely to rent hardware on Runpod. The RIO on self-hosting is very low unless you have a specific need or you're ok treating it as a hobby.

by bahmboo6 hours ago|

parent|

[-]

$2600 gets MBP M5 Pro 48gb. 64gb requires a Max which bumps it to $4200 at which point you may as well spend the $800 to go to 128gb.

by anonym2911 hours ago|

parent|

prev|

[-]

Bosgame M5 (Strix Halo) w/ 128 GB still goes for $2800 right now. SH systems have surged in price dramatically but quite unevenly.

>The best option is likely to rent hardware on Runpod.

Vast.ai is much cheaper, but the broader point here is contestable. The only dimension in which cloud GPU rentals win is cost. You lose the confidentiality, integrity, and availability benefits of local deployments.

by ai_fry_ur_brain11 hours ago|

parent|

[-]

Rentals are priced to pay themselves off in 1-1.5 years (when renting them out per hour, not selling tokens). Its never a better option to rent.

Not that I'd encourage anyone to throw large amounts of money to have access to LLMs, but you're definately going to be better off buying something that you can amortize over multiple years with a multi year warranty.

by ai_fry_ur_brain11 hours ago|

parent|

prev|

[-]

And for what? Spend 10-15k for the slopiest of slop code, non deterministic automations, and the ability to spawn an AI gf?

This whole thing is really starting to remind me of the crypto hype phases of 2016-2018 when everyone thought their investment in GPUs was going to make them rich.

by organsnyder11 hours ago|

parent|

[-]

It is possible to get real work done with LLMs. There are plenty of ethical concerns, and they're definitely over-hyped, but they are exceptionally useful tools when used well.

by varispeed11 hours ago|

parent|

[-]

[dead]

by dvfjsdhgfv9 hours ago|

parent|

prev|

[-]

I upvoted your comment even though I disagree with you.

Yes, LLMs are sloppy, and local models usually more so (but things change fast).

But the local ones have one big advantage: they are private. So you can safely feed them the collection of your private documents and things you wouldn't trust people like sama with. The fact that some people do not care is one of the failures of our educational system.

by 11 hours ago|

parent|

prev|

[-]

deleted

by gamander29 hours ago|

parent|

prev|

[-]

These models contain a wealth of knowledge that is being censored, not just deliberately, but by training data bias. Fine-Tuning and steering can produce unexpected new insights. For example a model that is trained to believe so-called "conspiracy theories", which many believe to be the ground truth.

by smcleod4 hours ago|

prev|

[-]

Really right now it's the M5 Max MacBook Pro 128GB, the RTX6000 is a nice card but you'd need more than one of them and you have to have a desktop to suit. The DGX Spark is slow and has pretty limited software support.

by embedding-shape11 hours ago|

prev|

[-]

If I could find a RTX Pro 6000 for $5K I'd definitively grab it, I'm running RedHatAI/Qwen3.6-35B-A3B-NVFP4 on one (I had to pay closer to $10K for it though) with 260K context and it's a blast! ds4 by antirez also works well, even IQ2XXS seems to work relatively well but Qwen3.6-35B-A3B-NVFP4 is both faster and higher quality responses (at least for coding and translations which I use them mostly for).

by tandr3 hours ago|

prev|

[-]

Don't mind me asking, but where did you find $5k RTX 6000? Even 48GB model (previous gen) shows minimum at 7k, and 96GB one (Blackwell) is ~10k on Amazon...

by tarruda12 hours ago|

prev|

[-]

> What’s the price point for getting into that sweet spot?

In October/2024 I got my Mac studio M1 ultra with 128G, IIRC it was ~$2500. With recent prices explosion, it has certainly gotten more expensive. https://frame.work/ is selling 128G strix halo mainboard for $2700, but you have to add storage and case.

by ttoinou12 hours ago|

prev|

[-]

M5 Max 64GB (sweet spot) or 128GB (only 1000 USD, better to keep it for the future) more are the best quality price ratio, future proof, reliable, resellable and flexible workloads. Harder to use as a server might be the only drawback

by throwaw1212 hours ago|

parent|

[-]

What do you recommend for non-Mac setup? I am a Mac user, but its getting expensive, and not seeing reason to jump to the latest M5

by barbacoa9 hours ago|

parent|

[-]

Try looking into Ryzen AI Max 395. AMD made a CPU/GPU soc with unified memory specifically for ai inference. Can buy mini PCs with up to 128gb ram.

by krzyk7 hours ago|

parent|

[-]

Isn't CUDA/nvidia the go to solution for most local models, with the rest being second class citizents?

by gcr6 hours ago|

parent|

[-]

Depends. ROCm is pretty well-supported for example.

Non-NVIDIA backends tend to get less support and new features land slower, or features that are expected to improve performance wind up hurting it instead. That sort of thing.

For basic “token in/token out” workloads without fine tuning, it’s probably fine ??

by simple108 hours ago|

parent|

prev|

[-]

The Ryzen AI Max 395 128gb is super cool, but not fast for inference. Order of magnitude slower than dedicated GPU but at half the cost. You can run larger models on it but it's slow. Great for local async work. Not great for daily chat or code agent driver.

by throwa3562628 hours ago|

parent|

[-]

The latest NPUs are pretty fast, I think what is missing is more optimised software support.

by plagiarist7 hours ago|

parent|

[-]

The vRAM bandwidth is at least as much a problem as compute on these ones, there is a lot of data to shuffle around

by varispeed11 hours ago|

parent|

prev|

[-]

Probably a comparable non-Mac setup will be Threadripper, but it will become much more expensive. My view is that actually Apple products are the cheapest on the market when it comes to performance.

by roger_12 hours ago|

parent|

prev|

[-]

M5 Max 128GB for $1k?

by tempoponet11 hours ago|

parent|

[-]

The memory upgrade is $1k on a Macbook Pro. The laptop is ~$5500.

by smallerize11 hours ago|

parent|

prev|

[-]

I think they mean the upgrade to 128GB is +$1k.

by anonym2912 hours ago|

prev|

[-]

Strix Halo at $2k with similar TG and about half the PP of DGX Spark was a pretty good deal IMO, especially considering it's also a full x86 system... 16c/32t Zen 5, 40 CU RDNA 3.5, 128 GB unified memory at ~220 GB/s real-world speeds (256 GB/s theoretical) - that runs full tilt at 140W in performance mode and idles at ~10W.

Unfortunately, the prices rose on these a lot, but unevenly. Beelink GTR 9 Pro is $4400, Framework Desktop is ~$3500, for what is basically the exact same mainboard as a Bosgame M5 for $2800.

Apple's M5 Max is another attractive option. Apple silicon traditionally had great MBW and was good at TG, but struggled with PP, but the new neural engines in those GPU cores have made a big difference in a good way here.

Gorgon Halo is rumored for June announcement with Q4'26 release with basically +100 MHz clocks on Strix Halo, LPDDR5X-8533 instead of LPDDR5X-8000, but more importantly, 192 GB max instead of 128 GB.

I'd say it's better to wait for Gorgon Halo than to grab Strix Halo now. However, Medusa Halo, rumored for H2'27, is slated to have up to 26c Zen 6 (heterogeneous cores - kinds funny that AMD is heading towards these as Intel retreats from them), 48 CU of RDNA 5 instead of 40 CU RDNA 3.5, and a 384 bit bus w/ LPDDR6, which should make 256 GB at more like ~490-600 GB/s MBW, which will really make Strix and Gorgon Halo obsolete.

Also worth keeping an eye out for Serpent Lake (intel CPU + nvidia iGPU on a single board with unified memory, rumored for 2028-2029 iirc), and on the 160 GB Crescent Island Intel dGPU.

by pulse-dev11 hours ago|

prev|

[-]

[dead]