Note that not all quants are the same at a certain BPW. The smol-IQ2_XS quant I linked is pretty dynamic, with some tensors having q8_0 type, some q6_k and some q4_k (while the majority is iq2_xs). In my testing, this smol-IQ2_XS quant is the best available at this BPW range.
Eventually I might try a more practical eval such as terminal bench.
This is always the problem with the 2-bit and even 3-bit quants: They look promising in short sessions but then you try to do real work and realize they’re a waste of time.
Running a smaller dense model like 27B produces better results than 2-bit quants of larger models in my experience.
It would be nice to see a scientific assessment of that statement.
In my anecdotal experience I’ve been happier with Q6 and dealing with the tradeoffs that come with it over Q4 for Qwen3.5 27B.
They did reduce the number of experts, so maybe that was it?