upvote
deleted
reply
70B dense models are way behind SOTA. Even the aforementioned Kimi 2.5 has fewer active parameters than that, and then quantized at int4. We're at a point where some near-frontier models may run out of the box on Mac Mini-grade hardware, with perhaps no real need to even upgrade to the Mac Studio.
reply
>may

I'm completely over these hypotheticals and 'testing grade'.

I know Nvidia VRAM works, not some marketing about 'integrated ram'. Heck look at /r/locallama/ There is a reason its entirely Nvidia.

reply
> Heck look at /r/locallama/ There is a reason its entirely Nvidia.

That's simply not true. NVidia may be relatively popular, but people use all sorts of hardware there. Just a random couple of recent self-reported hardware from comments:

- https://www.reddit.com/r/LocalLLaMA/comments/1qw15gl/comment...

- https://www.reddit.com/r/LocalLLaMA/comments/1qw0ogw/analysi...

- https://www.reddit.com/r/LocalLLaMA/comments/1qvwi21/need_he...

- https://www.reddit.com/r/LocalLLaMA/comments/1qvvf8y/demysti...

reply
Mmmm, not really. I have both a4x 3090 box and a Mac m1 with 64 gb. I find that the Mac performs about the same as a 2x 3090. That’s nothing stellar, but you can run 70b models at decent quants with moderate context windows. Definitely useful for a lot of stuff.
reply
Are you an NVIDIA fanboy?

This is a _remarkably_ aggressive comment!

reply
Not at all. I don't even know why someone would be incentivized by promoting Nvidia outside of holding large amounts of stock. Although, I did stick my neck out suggesting we buy A6000s after the Apple M series didn't work. To 0 people's surprise, the 2xA6000s did work.
reply