undefined

upvote

points

by magicalhippo8 hours ago |

upvote

by lelanthran7 hours ago|

[-]

> Being open is nice though, even though it doesn't matter that much for folks like me with a single consumer GPU.

The value of open source is not that you will run it locally, it's that anyone can run it at all.

Even if you can't afford to purchase the hardware to run large open source models, someone would, price it at half the cost of the closed source models and still make a profit.

The only reason you are not seeing that happen right now is because the current front-running token-providers have subsidised their inference costs.

The minute they start their enshittification the market for alternatives becomes viable. Without open-source models, there will never be a viable alternative.

Even if they wanted to charge only 80% of what a developer costs, the existence of open source models that are not far behind is a forcing function on them. There is no moat for them.

reply

upvote

by yorwba6 hours ago|

[-]

The reason nobody is pricing Kimi K2.6 at half the cost of the best closed source models is that there are too many providers of the same model, so the competition drives prices down and they have to charge far less than that. https://openrouter.ai/moonshotai/kimi-k2.6/providers

reply

upvote

by 0xkvyb7 hours ago|

[-]

Totally agree with you. There is only so much time before SF tech runs out of subsidy bucks, and Chinese models take the consumer spotlight

reply

upvote

by DeathArrow8 hours ago|

[-]

>Being open is nice though, even though it doesn't matter that much for folks like me with a single consumer GPU.

Of course it matters because that makes coding plans much cheaper than those from Anthropic and OpenAI.

For personal use I have coding plans with GLM 5.1, Kimi K2.6, MiniMax M2.7 and Xiaomi MiMo V2.5 Pro and I am getting a lot of bang for the buck.

reply

upvote

by magicalhippo8 hours ago|

[-]

Currently it's not a huge difference given the subsidies of closed model subscriptions. Once that stops then yea it will be really nice to have open models as price competitors.

reply

upvote

by smj-edison7 hours ago|

[-]

At least in my experience switching from Claude Pro ($20/month) to Kimi 2.6 through ollama (also $20/month), I was almost always hitting my usage limit with Sonnet 4.6, but with ollama I haven't hit my usage a single time.

reply

upvote

by ode2 hours ago|

[-]

How many t/ps do you get with Kimi on Ollama?

reply

upvote

by DeathArrow6 hours ago|

[-]

>Currently it's not a huge difference given the subsidies of closed model subscriptions.

With Claude Max I was hitting the limits very fast.

reply

upvote

by keyle8 hours ago|

[-]

It absolutely does matter.

The enshittification will go unnoticed at first but I'm already finding my favourite frontier models severely nerfed, doing incredibly dumb stuff they weren't in the past.

We need open weight models to have a stable "platform" when we rely on them, which we do more and more.

reply

upvote

by magicalhippo8 hours ago|

[-]

Most people won't roll out their own K2 deployment across rented GPUs, so in that sense it doesn't matter that much, they'll be using a paid service which is just as much of a black box as Claude or ChatGPT. For example, on OpenRouter you can select a provider which state they use a given open model, but you have no idea what actually goes on behind the curtains, which quantization levels they use and so on.

That said, I do fully agree that it is valuable to have open near-frontier models, as a balance to the closed ones.

reply

upvote

by atemerev3 hours ago|

[-]

Well you can rent a capable node for a few hours for like $50, install Kimi yourself and verify occasionally whether it works just like in cloud providers.

reply

upvote

by roenxi6 hours ago|

[-]

> but you have no idea what actually goes on behind the curtains, which quantization levels they use and so on.

That would take something close to a global conspiracy of every technologist lying continuously to keep the tweaks secret. If necessary, I personally will rent some servers and run a vanilla Kimi K2.6 deployment for people to use at reasonable prices. I don't expect to ever make good on that threat because they are grim times indeed if I'm the first person doing something AI related, but the skill level required to load up a model behind an API is low.

So it isn't hard to see how there will be unadulterated Kimi models available and from there it is really, really straightfoward to tell if someone is quantising a model; just run some benchmarks against 2 different providers who both claim to serve the same thing. If one is quantising and another isn't there's a big difference in quality.

reply

upvote

by magicalhippo4 hours ago|

[-]

> If one is quantising and another isn't there's a big difference in quality.

Sure. But the problem is you have to do this continuously to have any measure of confidence, which is expensive. For example, a provider could at any point randomly start serving some fraction of the requests to a quantized model. Either due to "routing error", as Anthropic called one of their model degradation events, or trying to improve bottom line.

There's really no good way to detect this on a few-prompt level without overspending significantly, because they're all black boxes.

reply

upvote

by slopinthebag8 hours ago|

[-]

It's not really a black box. Useful models becoming fungible is crucial for disincentivizing bad behaviour with model providers. I can't really overstate how different it is from relying on closed models. If you don't like or trust any of the providers on OpenRouter you can rent the GPUs yourself and host it, although this is probably unnecessary.

reply

upvote

by echelon8 hours ago|

[-]

This is the future though. Open weights models that run on H200s provide far more opportunity to build products and real infrastructure around.

You can always distill this for your little RTX at home. But models shaped for consumer hardware will never win wide adoption or remain competitive with frontier labs.

This is something that _can_ compete. And it will both necessitate and inspire a new generation of open cloud infra to run inference. "Push button, deploy" or "Push button, fine tune" shaped products at the start, then far more advanced products that only open weights not locked behind an API can accomplish.

Now we just need open weights Nano Banana Pro / GPT Image 2, and Seedance 2.0 equivalents.

The battle and focus should be on open weights for the data center.

reply

upvote

by zozbot2347 hours ago|

[-]

These large MoE models can work quite well on consumer or prosumer platforms, they'll just be slow, and you have to offset that by running them unattended around the clock. (Something that you can't really do with large SOTA models without spending way too much on tokens.) This actually works quite well for DeepSeek V4 series which has comparatively tiny KV-cache sizes so even a consumer platform can run big batches in parallel.

reply

upvote

by bitmasher98 hours ago|

[-]

I don’t fully understand what open weights unlocks that cannot be accomplished via API from a product standpoint.

Open weights is great if you want to do additional training, or if you need on-prem for security.

reply

upvote

by mkl8 hours ago|

[-]

Multiple providers of the same model. That means competition for price, reliability, latency, etc. It also means you can use the same model as long as you want, instead of having it silently change behaviour.

reply

upvote

by Bombthecat5 hours ago|

[-]

Those open weight providers where found nerfing models too.

reply

upvote

by echelon7 hours ago|

[-]

> Open weights is great if you want to do additional training, or if you need on-prem for security.

The power of giving universities, companies, and hackers "full" models should not be understated.

Here are a just a few ideas for image, video, and creative media models:

- Suddenly you're not "blocked" for entire innocuous prompts. This is a huge issue.

- You can fine tune the model to learn/do new things. A lighting adjustment model, a pose adjustment model. You can hook up the model to mocap, train it to generate plates, etc.

- You can fine tune it on your brand aesthetic and not have it washed out.

reply

upvote

by stldev8 hours ago|

[-]

Or try to beat Anthropic's uptime.

reply

upvote

by tom2026hn5 hours ago|

[-]

[dead]

reply

upvote

by joshoink8 hours ago|

[-]

[flagged]

reply