Qwen seems better at one-shotting things based on vague prompts to an acceptable degree, but thats literally not what I use these things for!
One thing if people do play with it, is it seems very very sensitive to quantisation of the K part of the KV cache. F16 K and Q8 V got rid of a lot of the loops that it was otherwise hitting.
There's also a regression in llama.cpp wrt. Step Flash, where quantisation is getting worse KLD and Perplexity than it otherwise was previously, for the exact same quants. Very odd, but it's being looked into at least!