upvote
The paper is about vector quantization, which affects KV cache not model weights/sizes.
reply