The big take away, in my opinion, is that their technique for LUTs etc could also be applied to lossy quants as well. Say maybe you get 5bit accuracy in size of 4bit?
I don’t know, but maybe? Also their two stage design might make current quantized you kernal designs better.
And, maybe the methods stack for those willing to trade both costs for the smallest representation.