upvote
Quants will push it below 256GB without completely lobotomizing it.
reply
> without completely lobotomizing it

The question in case of quants is: will they lobotomize it beyond the point where it would be better to switch to a smaller model like GPT-OSS 120B that comes prequantized to ~60GB.

reply
In general, quantizing down to 6 bits gives no measurable loss in performance. Down to 4 bits gives small measurable loss in performance. It starts dropping faster at 3 bits, and at 1 bit it can fall below the performance of the next smaller model in the family (where families tend to have model sizes at factors of 4 in number of parameters)

So in the same family, you can generally quantize all the way down to 2 bits before you want to drop down to the next smaller model size.

Between families, there will obviously be more variation. You really need to have evals specific to your use case if you want to compare them, as there can be quite different performance on different types of problems between model families, and because of optimizing for benchmakrs it's really helpful to have your own to really test it out.

reply
> In general, quantizing down to 6 bits gives no measurable loss in performance.

...this can't be literally true or no one (including e.g. OpenAI) would use > 6 bits, right?

reply
Did you run say SWE Bench Verified? Where does this claim coming from? It's just an urban legend.
reply
Most certainly not, but the Unsloth MLX fits 256GB.
reply
Curious what the prefilled and token generation speed is. Apple hardware already seem embarrassingly slow for the prefill step, and OK with the token generation, but that's with way smaller models (1/4 size), so at this size? Might fit, but guessing it might be all but usable sadly.
reply
They're claiming 20+tps inference on a macbook with the unsloth quant.
reply
Yeah, I'm guessing the Mac users still aren't very fond of sharing the time the prefill takes, still. They usually only share the tok/s output, never the input.
reply
My hope is the Chinese will also soon release their own GPU for a reasonable price.
reply