upvote
Are there evidence that this approach helps maintain "accuracy" performance when quantized? It sounds a bit like mxfp4 with gpt-oss, which was a confusing model upon release.
reply
I have just been humbled by the Gemma 4 26B QAT build (unsloth's version), which insisted repeatedly that I am wrong in my requirements for some niche wordpress code, which cannot be satisfied.

I am a good WP developer so I kept prodding it and it kept insisting, and it explained with clarity. Turns out it is right and I was wrong, as I would have found out if I'd written the code myself.

I've been using this particular test for days, experimenting in ways to generate and prompt code. The 4-bit quantisation of the pre-QAT model does not catch this error. And nor can the Qwen 3.6 sparse model, which confidently blazed past it and never mentioned it.

(FWIW neither did plain ChatGPT; maybe Codex would)

Anecdotal, but there you go. I am somewhat weirded out by it.

reply
So what we want now is unsloth (or anyone) to release 4/6-bit quantized models of these releases?
reply
Yep, Unsloth already did, as linked in the comment at the top of this thread
reply