upvote
"Full quality" being a relative assessment, here. You're still deeply compute constrained, that machine would crawl at longer contexts.
reply
[flagged]
reply
deleted
reply
70B dense models are way behind SOTA. Even the aforementioned Kimi 2.5 has fewer active parameters than that, and then quantized at int4. We're at a point where some near-frontier models may run out of the box on Mac Mini-grade hardware, with perhaps no real need to even upgrade to the Mac Studio.
reply
>may

I'm completely over these hypotheticals and 'testing grade'.

I know Nvidia VRAM works, not some marketing about 'integrated ram'. Heck look at /r/locallama/ There is a reason its entirely Nvidia.

reply
Are you an NVIDIA fanboy?

This is a _remarkably_ aggressive comment!

reply
Which while expensive is dirt cheap compared to a comparable NVidia or AMD system.
reply
It's still very expensive compared to using the hosted models which are currently massively subsidised. Have to wonder what the fair market price for these hosted models will be after the free money dries up.
reply
Inference is profitable. Maybe we hit a limit and we don't need as many expensive training runs in the future.
reply
Inference APIs are probably profitable, but I doubt the $20-$100 monthly plans are.
reply
For sure Claude Code isn’t profitable
reply
What speed are you getting at that level of hardware though?
reply