This is always the problem with the 2-bit and even 3-bit quants: They look promising in short sessions but then you try to do real work and realize they’re a waste of time.
Running a smaller dense model like 27B produces better results than 2-bit quants of larger models in my experience.
It would be nice to see a scientific assessment of that statement.
In my anecdotal experience I’ve been happier with Q6 and dealing with the tradeoffs that come with it over Q4 for Qwen3.5 27B.