undefined

points

[-]

This is a very well established idea. It's called dynamic quantization. Vary the quantization bit-width (or skip quantization altogether) on a layer by layer basis, using a calibration dataset.

EvoPress is the first time that comes to my mind, when I think of dynamic quantization.

https://arxiv.org/abs/2410.14649

by buildbot5 hours ago|

prev|

[-]

This is a thing! For example, https://arxiv.org/abs/2511.06516

by fcpk5 hours ago|

parent|

[-]

that's brilliant, I wonder why we haven't seen much use of it to do very heavy quantization