undefined

points

[-]

Catching accidental drift is still worth a lot. It's basically the same idea as performance regression tests in CI, nobody writes those because they expect sabotage. It's for the boring stuff, like "oops, we bumped a dep and throughput dropped 15%".

If someone actually goes out of their way to bypass the check, that's a pretty different situation legally compared to just quietly shipping a cheaper quant anyway.

by KeplerBoy3 hours ago|

parent|

[-]

Also it's not just about running an obviously worse quant.

Running different GPU kernels / inference engines also matters. It's easy to write an implementation that is faster and thus cheaper but numerically much noisier / less accurate.

by jychang3 hours ago|

parent|

prev|

[-]

Yeah, the threat model is nonexistent. Most people use a dozen or so well known providers, who have no incentives to so obviously cheat.

by frogperson11 hours ago|

prev|

[-]

Providers like OpenRouter default to the cheapest provider. They are often cheap because they are rediculously quantized and tuned for throughput, not quality.

This is probably kimi trying to protect their brand from bargain basement providers that dont properly represent what the models are capable of.

by stingraycharles6 hours ago|

parent|

[-]

Openrouter has “exacto” verified models trying to combat this, but it seems like it’s not available for most of the models.

by latchkey11 hours ago|

parent|

prev|

[-]

> This is probably kimi trying to protect their brand from bargain basement providers that dont properly represent what the models are capable of.

I'm curious what exactly they mean by this...

"because we learned the hard way that open-sourcing a model is only half the battle."

by gpm15 hours ago|

prev|

[-]

Yes and no.

For a truly malicious actor, you're right. But it shifts it from "well we aren't obviously committing fraud by quantizing this model and not telling people" to "we're deliberately committing fraud by verifying our deployment with one model and then serving customer requests with another".

I suspect there's a lot of semi-malicious actors who are only happy to do the former.

by j-bos15 hours ago|

prev|

[-]

Seems like a great challenge for all these systems, see fromtier labs serving quants when under hesvy load.

by cuki2882 hours ago|

prev|

[-]

[dead]