Asymmetric Quantization: Near-Lossless Retrieval with 97% Storage Reduction

upvote

Asymmetric Quantization: Near-Lossless Retrieval with 97% Storage Reduction

(www.mixedbread.com)

95 points

by breadislove2 days ago |

upvote

by Zagreus21424 hours ago|

[-]

``` We evaluated several precision pairings across our internal retrieval benchmark suite. Scores are NDCG@10 averaged across the suite, scaled to 0–100. NDCG@10 (Normalized Discounted Cumulative Gain at rank 10) measures how well the top 10 results are ordered against the ideal ranking, rewarding relevant documents more when they appear higher, with 100 being a perfect ranking. The full-precision baseline averages 90.26. Int8 query against binary documents averages 89.65, a 0.61 point drop, while reducing document-vector storage by 32x ```

Saying "Near lossless" to mean 90% accurate retrieval of saved vectors is simply a lie. Lossy-ness is binary, not something you can paper over with getting close enough. And 90% is not close. Sure, LLMs are all about gradient descent on noisy data sets so I guess this is acceptable in this field but that terminology usage still bothered me

reply

upvote

by kittoes3 hours ago|

[-]

I don't believe that's what they were saying at all though. The claim appears to be that it's near lossless relative to their own baseline that uses float. Which I'd grant, since a 32x storage reduction for 0.61% loss in quality is a reasonable trade off when you've already decided to accept that ~90% is "good enough".

reply

upvote

by seritools3 hours ago|

[-]

near lossless refers to being 89.65/90.26 = 99.32% of baseline, i'm pretty sure.

reply

upvote

by breadislove2 hours ago|

[-]

yes exactly.

reply

upvote

by elil177 hours ago|

[-]

I would love to see real examples of what reduced quality means in practice. Are you able to recover a document from the vector in a human readable format? If so, what sort of changes come up?

I could imagine a scenario where differences tend to be more substantive than you'd expect because of how less frequent words with fine distinctions in meaning - the very words that make the document special - may be embedded in the vector space.

reply

upvote

by yorwba6 hours ago|

[-]

Most of the fine distinctions are already lost when a document is processed through a pile of linear algebra to turn it into a fixed-size list of floating-point numbers, as you can see from the NDCG@10. Vector search is not a tool for fine distinctions. It's a tool for reducing a large pile of documents to a smaller selection of candidates, which you can then check individually with some more expensive method.

reply

upvote

by breadislove2 hours ago|

[-]

The ndcg loss is minimal 90.26 -> 89.65. This means it maintains most of the quality.

reply

upvote

by breadislove2 hours ago|

[-]

this is the reason why we report ndcg and not recall. ndcg respects fine grained details so you get the an overview of how much details you are trading off since it would hurt the ranking.

reply

upvote

by purple-leafy7 hours ago|

[-]

Hey breadislove; amazing article, I’ll be sending mixedbread an email in the morning that may interest you (email will be <5-characters>@pm.me)

I have also been working in compression and performance engineering, and managed to get a 99+% compression unlock versus conventional approaches (100+KB down to 1KB) in the scenario of 30 minute massive multiplayer game replays for a “game+engine” I’m developing

I think there’s a synergy between these 2 concepts I’d love to chat some more

reply

upvote

by palinnilap4 hours ago|

[-]

Any way I can read about this or the use case? I have a hobby interest

reply

upvote

by breadislove2 hours ago|

[-]

to which email did you send it? can u send it to support please?

reply

upvote

by 7 hours ago|

[-]

deleted

reply

upvote

by derrickquinn2 hours ago|

[-]

Asymmetry is clever. FWIW, this is very similar to the strategy employed by BitNet models (i.e., int8 activations with binary or ternary weights); I suspect retrieval is a little more amenable to this approach.

In principle, binary x binary should be pretty fast since it just requires bitwise XNOR and popcount/reduction, but in practice it's slow unless you've really optimized it. And, as stated in the article, you'd still be losing a lot of accuracy that way.

reply

upvote

by kaizenite4 hours ago|

[-]

To people smarter than me, how impressive and/or revolutionary is this?

reply

upvote

by functionmouse7 hours ago|

[-]

there is no such thing as "near lossless"

reply

upvote

by ttoinou6 hours ago|

[-]

There is, after you define what you’re ready to loose and understand the lossy space. That’s how we came up with mobile cellphones, audio and video codecs etc. Literally powering all modern devices we use.

reply

upvote

by greenleafone74 hours ago|

[-]

So then ... "lossy"

reply

upvote

by tancop1 hours ago|

[-]

theres a big difference between 99% quality and 30%. near lossless is a good name for the first one. if you treat it in a binary way where everything short of 100 falls into one "lossy" bucket you lose all the practical differences that make one encoding much better than another.

reply

upvote

by functionmouse55 minutes ago|

[-]

> theres a big difference between 99% quality and 30%.

sure

> if you treat it in a binary way where everything short of 100 falls into one "lossy" bucket you lose all the practical differences that make one encoding much better than another.

no; lossless is an inherently binary term. and I don't lose all the practical differences of better lossy encoders by understanding that; I'm not just going to start using mp3 96k because I have an understanding of lossless vs lossy encoders...

Lossless is an objectively binary term.

reply

upvote

by 3 hours ago|

[-]

deleted

reply

upvote

by functionmouse4 hours ago|

[-]

Actually, all of those things are considered "lossy".

reply

upvote

by ttoinou3 hours ago|

[-]

Yes, anything not lossless is lossy. Near-lossless is not lossless, so it is lossy. I hope we speak the same language

reply

upvote

by alfiedotwtf4 hours ago|

[-]

If you squint hard enough, it sounds like their storage layer is a bloom filter

reply

upvote

by rq18 hours ago|

[-]

The Pi compression algorithm is better.

reply

upvote

by luma5 hours ago|

[-]

Doubtful. The problem with the pi idea is that you need to include the offset, which will likely be as long as or longer than your data.

reply

upvote

by nathan_compton5 hours ago|

[-]

" A single document produces more then one embedding, depending on the complexity of the document it can produce hundreds or thousands of vectors."

That typo up there is kind of endearing in the AI slop era.

reply

upvote

by HenryMulligan3 hours ago|

[-]

Not seeing a typo in your quote. Can you point it out?

reply

upvote

by thatspartan2 hours ago|

[-]

I think they're referring to "then" vs "than"

reply

upvote

by breadislove2 hours ago|

[-]

ah whoops, I'll fix it. ty!

reply

upvote

by vasylvd56 minutes ago|

[-]

[flagged]

reply

upvote

by dismissed1811 hours ago|

[-]

[dead]

reply

upvote

by m_m_carvalho5 hours ago|

[-]

[dead]

reply

upvote

by mv_d5339e319 hours ago|

[-]

[dead]

reply

upvote

by johnathan10110 hours ago|

[-]

[flagged]

reply

upvote

by 9 hours ago|

[-]

deleted

reply

upvote

by TradingReality1 days ago|

[-]

[flagged]

reply

upvote

by Ameo9 hours ago|

[-]

[flagged]

reply

upvote

by mwigdahl3 hours ago|

[-]

Unfortunately as cost reduction trends to 100%, it comes along with an intrinsic high-pass sarcasm filter.

reply

upvote

by throwaway20276 hours ago|

[-]

You would obviously be trading storage for compute and time to retrieve the storage.

reply

upvote

by throwaw128 hours ago|

[-]

100% reduction is impossible for something which should work, because -100% means it is now 0

reply

upvote

by neonstatic8 hours ago|

[-]

They were clearly being sarcastic

reply

upvote

by peheje8 hours ago|

[-]

Reminds me of 'Learning to be me' by Greg Egan

reply