upvote
Kimi uses INT4 as its native format, there's no such thing as "better than 4-bit precision" for that model. This is in contrast with GLM for which 16-bit precision is native and 8-bit is in common use.
reply
You’re right, but this poses a separate issue as the providers then do FP4 PTQ, which is quite lossy. Reduces the model size and optimizes for Blackwells at the (imo severe) cost of performance.
reply
MI355X can perform FP6 operations with the same speed as their FP4 (unique to AMD) - people should be making MXFP6 quants which would be pretty much lossless, and much closer to FP4 performance than FP8
reply
That can only be true if the workload is compute bound, not memory bandwidth bound.
reply
Doesn't Nvidia with their NVFP4 claim that it's lossless?

I haven't tested enough models Nvidia has converted to NVFP4 besides GLM 5.2 but it seemed fine to me.

My own luck has been hit or miss with it.

reply
Certainly not lossless. Whether the loss matters depends on the range of values being quantized. When there are outliers that are massively higher than their neighbors the precision of those neighbors gets wrecked(or the outlier gets clipped), so it's important to utilize strategies that decrease the maximum value or increase the minimum. I suspect some models put more effort into that and therefore are more effective when quantized.
reply
First thing I noticed as well
reply
from memory, it is like 96-98% of the accuracy.
reply
Accuracy isn't a meaningful metric here without reference to a specific task.
reply
Additionally, I'd imagine quantization to have more side-effects than just slightly lower performance (on whatever task). You are basically removing information, and that information could be by chance what the model needs to fulfill it exactly the way you'd want to do - although it's still fully capable. I am not sure if this is really different from "lower performance" but open to hear your opinions.
reply
And that 2%-4% makes all the difference.
reply
Yes, it's like saying "we took off a big chunk of his brain but look! He can still breathe autonomously, swallow food and walk almost straight, which is like 95% of what he did before!"
reply