undefined

points

[-]

Efficient execution on the GPU appears to have been one of the specific aims of the authors. Table 2 of their paper shows real world performance that would appear at a glance to be compatible with inference.

by mskkm8 hours ago|

parent|

[-]

This is not an LLM inference result. Table 2 is the part I find most questionable. Claiming orders-of-magnitude improvements in vector search over standard methods is an extraordinary claim. If it actually held up in practice, I would have expected to see independent reproductions or real-world adoption by now. It’s been about a year since the paper came out, and I haven’t seen much of either. That doesn’t prove the claim is false, but it certainly doesn’t inspire confidence.

by NitpickLawyer11 hours ago|

prev|

[-]

Apparently MLX confirmed it - https://x.com/prince_canuma/status/2036611007523512397

by mskkm11 hours ago|

parent|

[-]

They confirmed on the accuracy on NIAH but didn't reproduce the claimed 8x efficiency.

by veunes12 hours ago|

prev|

[-]

Classic academic move. If the authors show accuracy-vs-space charts but hide end-to-end latency, it usually means their code is slower in practice than vanilla fp16 without any compression. Polar coordinates are absolute poison for parallel GPU compute

by fc417fc8028 hours ago|

parent|

[-]

I don't think they're using polar coordinates? They're quantizing to grid centroids.