The value of open source is not that you will run it locally, it's that anyone can run it at all.
Even if you can't afford to purchase the hardware to run large open source models, someone would, price it at half the cost of the closed source models and still make a profit.
The only reason you are not seeing that happen right now is because the current front-running token-providers have subsidised their inference costs.
The minute they start their enshittification the market for alternatives becomes viable. Without open-source models, there will never be a viable alternative.
Even if they wanted to charge only 80% of what a developer costs, the existence of open source models that are not far behind is a forcing function on them. There is no moat for them.
Of course it matters because that makes coding plans much cheaper than those from Anthropic and OpenAI.
For personal use I have coding plans with GLM 5.1, Kimi K2.6, MiniMax M2.7 and Xiaomi MiMo V2.5 Pro and I am getting a lot of bang for the buck.
With Claude Max I was hitting the limits very fast.
The enshittification will go unnoticed at first but I'm already finding my favourite frontier models severely nerfed, doing incredibly dumb stuff they weren't in the past.
We need open weight models to have a stable "platform" when we rely on them, which we do more and more.
That said, I do fully agree that it is valuable to have open near-frontier models, as a balance to the closed ones.
That would take something close to a global conspiracy of every technologist lying continuously to keep the tweaks secret. If necessary, I personally will rent some servers and run a vanilla Kimi K2.6 deployment for people to use at reasonable prices. I don't expect to ever make good on that threat because they are grim times indeed if I'm the first person doing something AI related, but the skill level required to load up a model behind an API is low.
So it isn't hard to see how there will be unadulterated Kimi models available and from there it is really, really straightfoward to tell if someone is quantising a model; just run some benchmarks against 2 different providers who both claim to serve the same thing. If one is quantising and another isn't there's a big difference in quality.
Sure. But the problem is you have to do this continuously to have any measure of confidence, which is expensive. For example, a provider could at any point randomly start serving some fraction of the requests to a quantized model. Either due to "routing error", as Anthropic called one of their model degradation events, or trying to improve bottom line.
There's really no good way to detect this on a few-prompt level without overspending significantly, because they're all black boxes.
You can always distill this for your little RTX at home. But models shaped for consumer hardware will never win wide adoption or remain competitive with frontier labs.
This is something that _can_ compete. And it will both necessitate and inspire a new generation of open cloud infra to run inference. "Push button, deploy" or "Push button, fine tune" shaped products at the start, then far more advanced products that only open weights not locked behind an API can accomplish.
Now we just need open weights Nano Banana Pro / GPT Image 2, and Seedance 2.0 equivalents.
The battle and focus should be on open weights for the data center.
Open weights is great if you want to do additional training, or if you need on-prem for security.
The power of giving universities, companies, and hackers "full" models should not be understated.
Here are a just a few ideas for image, video, and creative media models:
- Suddenly you're not "blocked" for entire innocuous prompts. This is a huge issue.
- You can fine tune the model to learn/do new things. A lighting adjustment model, a pose adjustment model. You can hook up the model to mocap, train it to generate plates, etc.
- You can fine tune it on your brand aesthetic and not have it washed out.