undefined

points

[-]

2 reasons.

First, it's not really "1 bit", actually much closer to 2-bit. IQ1_M is actually 1.75bit and IQ2_XXS is 2.06bit This is from the ./llama-quantize --help with most of the quant types and their size in bpw: https://pastebin.com/bCUqGfeE

And to elaborate on the "dynamic" aspect inconito said in the other comment, if you click on one of the .gguf files in huggingface:

https://huggingface.co/unsloth/GLM-5.2-GGUF/blob/main/UD-IQ1...

There are a lot of Q5_K, Q6_K, etc tensors. Only the routed experts (ffn_gate_exps.weight, ffn_up_exps.weight, ffn_down_exps.weight) are heavily quantized, and it looks like the down_proj is actually iq3_xxs for this model.

by incognito12412 hours ago|

prev|

[-]

Keyword dynamic, the parameters are quantized on a case by case basis