upvote
reply
Is there an error in the visualization? It shows that every vector is rotated the same amount. My understanding was that they are randomized with different values, which results in a predictable distribution, which is easier to quantize.
reply
Awesome! So it nudges the vectors into stepped polar rays.. It's effectively angle snapping? Plus a sort of magnitude clustering.
reply
Good post but link at the end is broken.

“”” For the full technical explanation with equations, proofs, and PyTorch pseudocode, see the companion post: TurboQuant: Near-Optimal Vector Quantization Without Looking at Your Data.“

reply
I like the visualization, but I don’t understand the grid quantization. If every point is on the unit circle aren’t all the center grid cords unused?
reply
Yeah that's odd. It seems like you'd want an n-1 dimensional grid on the surface of the unit sphere rather than an n dimensional grid within which the sphere resides.

Looking at the paper (https://arxiv.org/abs/2504.19874) they cite earlier work that does exactly that. They object that grid projection and binary search perform exceptionally poorly on the GPU.

I don't think they're using a regular grid as depicted on the linked page. Equation 4 from the paper is how they compute centroids for the MSE optimal quantizer.

Why specify MSE optimal you ask? Yeah so it turns out there's actually two quantization steps, a detail also omitted from the linked page. They apply QJL quantization to the residual of the grid quantized data.

My description is almost certainly missing key details; I'm not great at math and this is sufficiently dense to be a slog.

reply
i think grid can be a surface of the unit sphere
reply
1. Efficient recursive transform of kv embeddings into polar coordinates 2. Quantize resulting angles without the need for explicit normalization. This saves memory via key insight: angles follow a distribution and have analytical form.
reply
Reminds me vaguely of Burrows-Wheeler transformations in bzip2.
reply
That overview is frustratingly high-level. I know what a vector is, a bit, and yet that compression description is crazy uninformative. And that PolarQuant visualization is.. Very abstract.
reply
The way I understand it, it's a way of compressing vectors by switching from their per-component representation to polar coordinates representation, where the nearby vectors are clumped together to a single line, allowing to describe them by different lengths
reply