undefined

points

[-]

> Unless you mean that releasing open weights models is the loss leader, in which case, you might be right but I hope you're wrong.

This is specifically what I meant.

DeepSeek’s official service is trying to recoup some of the training and engineering costs too.

The other providers only have to recoup their hardware costs and the cost of a team to run it.

Even though DeepSeek’s official service is more expensive per token, they’re running at a lower profit than the OpenRouter providers because they had to pay for the R&D.

This is a deliberate choice. We already see it with Qwen splitting their releases between open weight and hosted only models. The open weights are a loss leader to get attention. Without them you’d almost never hear about their hosted models.

by aftbit17 minutes ago|

parent|

[-]

Except DeepSeek's official service is _less_ expensive per token, which suggests they're underpricing it substantially as well to attempt to draw more attention / more data.

by Sebb7674 hours ago|

prev|

[-]

> I hope there's always someone willing to make this bet and release better and better open models.

What would this bet be? Training is expensive and open weights mean that for hosting you compete on price with people that don't have this item on their bill.

by aftbit4 hours ago|

parent|

[-]

"Attention is all you need" - the larger bet is that by releasing your models open-weight, you'll get more attention and mindshare than if you tried to jump in to compete with the major closed providers, and the value of that attention will outweigh the cost of the training run.

So far, it's really only the Chinese labs (and FAIR or whatever Meta's project is called now) that are doing this. Oh yeah, and Google's Gemma.

At the moment, this is all massively distorted by the prestige and investment money flowing into the space. None of the labs have to charge the real cost of inference let alone the marginal cost of training because they are instead lighting investment money on fire to cover that.

One imagines (though I have not investigated in detail) that there's a degree of national prestige work going on too. The Chinese labs are trying to show that they can build better and more efficient models and are releasing open to undercut the US labs.