I say this as a relatively frequent user of Kimi models and generally a big fan. But on not-yet-gamed benchmarks like DeepSWE, Kimi K2.6 is beaten soundly by Claude Sonnet 4.6 ($3 / $15) and even slightly by GPT 5.4 Mini ($0.75 / $4.50).
There's no question Kimi models are very good for a lot of code tasks. They're the best quality open weight model. But to get similar overall outcomes as on Sonnet/Opus, on average you'll spend many more tokens and will have to do more managing of the model. You shouldn't look at price per token, you should look at how much you pay for the entire process.
One major thing DeepSWE has going for it is that all other benchmarks (including those quoted by MoonshotAI on this page) don't: the other benchmarks that are completely gamed. The benchmark answers are public and part of each model's training data. This benchmark may still be iffy, but at least it's not gamed.
Everybody has incentives to manipulate benchmark results to show their models in the best light.
The reality is that $20/$100/$200/mo feels reasonable to a lot of people relative to the value they're getting out of Claude, and if they switch to something else, there's a risk that it won't be as good, and they'll have a new tool to learn.
It's not an insurmountable moat, but don't underestimate the user experience. The iPod didn't win because it was the cheapest device or the one with the most features.
I also wonder if Enterprises have deals for other API pricing that is not posted publicly, so all we see is a high API sticker price.
It's only marginally better in the things it's actually comparable to. A\ models are MUCH better in many more things; eg: things Kimi/etc. didn't distill.
For those things the difference is like a cliff.
I'd further say that there are probably enough rational actors running evals out there that the marginally better is not pure vibes for the cases where people are spending lots of money, but I only have direct line of sight to some of those eval suites. Maybe everyone is irrational and anthropic is exploiting that!
But if AI doesn't lead quickly to vast large scale replacement of workers as promised, I could definitely see the C-suits and their gaggle of consultants starting to ask questions about token pricing.
Lots of US providers are hosting these “open source” models so doubt that’s the problem.