undefined

points

by jychang16 hours ago |

comments

by tosh16 hours ago|

[-]

Unsloth have just released benchmarks on how their dynamic quants perform for Qwen 3.5

https://unsloth.ai/docs/models/qwen3.5/gguf-benchmarks

by jychang15 hours ago|

parent|

[-]

I'm aware of that, but that's not the link of the post. The post is linking to their UD 2.0 quants from a few months back.

Also, the benchmarks are because they messed up the first version of their Qwen 3.5 XL quants by quanting some tensors to mxfp4 that should have been in higher quality, and this is their bugfix. The post literally starts out with "We updated Qwen3.5-35B Unsloth Dynamic quants being SOTA on nearly all bits" without explaining WHY they needed to update from the original version.

by danielhanchen14 hours ago|

parent|

[-]

Didn't expect this to be on HN haha - but sometimes HN does have older posts come up sometimes.

No your conclusion is false - only the old Q4_K_XL had slightly higher perplexity, all other quants are fine. We uploaded 9TB of research artifacts to https://huggingface.co/unsloth/Qwen3.5-35B-A3B-Experiments-G... for the community.

If you read our blog, it says KLD and PPL are actually sometimes counterintuitive - for example MiniMax some of our quants do worse on PPL and KLD vs AesSedai's one for example, but does worse on LiveCodeBench by a lot see https://unsloth.ai/docs/models/qwen3.5/gguf-benchmarks#id-3-...

This is because see https://unsloth.ai/docs/models/qwen3.5/gguf-benchmarks#id-1-... - although bitwidths are in general monotonic ie q2_k < q3_k < q4_k < q5_k etc, we find KLD and PPL are actually not monotonic ie q3_k can actually have BETTER PPL than q4_k.

So the main point is bad luck on quantization - sometimes lower bits might get lower PPL and KLD, but actually this is a ruse and wrong, since on actual real world tasks, it's worse.

by jychang13 hours ago|

parent|

[-]

The Q4_K_XL is easily the most popular quant for the model, though.

So then why was Q4_K_XL having issues? Is it just a PPL issue that doesn't reflect in real world usage? If yes, why not just say that? "The Q4_K_XL had lower PPL, but don't worry, PPL can be wrong, and other benchmarks show it's fine". If it was a real quality issue, then where was the issue caused by?

The blog post says "Retiring MXFP4 from all GGUF quants: Q2_K_XL, Q3_K_XL and Q4_K_XL, except for pure MXFP4_MOE" but doesn't say why. The easy assumption that most people would make is "oh, you quanted attention or ssn or something to mxfp4 and that turned out to be bad, so you retire mxfp4" but if you say that it's not that, then what's the actual issue?

by segmondy5 hours ago|

parent|

[-]

each layer is made up of various weights, the weights are adjusted to quant it. a pure q8 will have all the weights as q8, or a q4 the same. but some are kept as f32, etc. here's an example of q3_k_xl - https://huggingface.co/unsloth/Kimi-K2-Thinking-GGUF/tree/ma... we can see certain weights are f32, q8, q5, q3, etc. They used mxfp4 in some weights and mxfp4 doesn't seem to place nicely in quants so that's why they are retiring it. read their publication again and it should make more sense.

by jychang2 hours ago|

parent|

[-]

I am aware of all that.

They literally never say “they used mxfp4 in some weights”. What you’re claiming they said doesn’t exist.

This isn’t a postmortem, it’s PR fluff without actually addressing the issue.

by lostmsu15 hours ago|

parent|

prev|

[-]

Looking at their benchmarks there doesn't appear to be meaningful difference between their quants and bartowsky quants.

by danielhanchen14 hours ago|

parent|

[-]

No our Qwen3.5 new ones show the opposite see https://unsloth.ai/docs/models/qwen3.5/gguf-benchmarks

by lostmsu8 hours ago|

parent|

[-]

Am I misreading the table?

  Unsloth Q4_K_M

  PPL:       6.6053     KLD 99.9%: 0.5478     KLD mean: 0.0192

  bartowski Qwen_Q4_K_M

  PPL:       6.6097     KLD 99.9%: 0.5771     KLD mean: 0.0182

Barely noticeable drop in PPL; noticeable KLD drop (good, 5%); but worse KLD mean (bad, 5%).

by danielhanchen14 hours ago|

prev|

[-]

Didn't expect this as well haha on HN again - probably related to Qwen3.5