If someone actually goes out of their way to bypass the check, that's a pretty different situation legally compared to just quietly shipping a cheaper quant anyway.
Running different GPU kernels / inference engines also matters. It's easy to write an implementation that is faster and thus cheaper but numerically much noisier / less accurate.
This is probably kimi trying to protect their brand from bargain basement providers that dont properly represent what the models are capable of.
I'm curious what exactly they mean by this...
"because we learned the hard way that open-sourcing a model is only half the battle."
For a truly malicious actor, you're right. But it shifts it from "well we aren't obviously committing fraud by quantizing this model and not telling people" to "we're deliberately committing fraud by verifying our deployment with one model and then serving customer requests with another".
I suspect there's a lot of semi-malicious actors who are only happy to do the former.