upvote
> it may underestimate the inference margins of Ant/OAI's API pricing.

If true then why are neither Anthropic or OpenAI dropping their API pricing to gain market share when both are clearly doing all sorts of political and PR maneuvering to compete in a cutthroat market?

Since they aren't dropping the API usage prices (and are in fact raising them in a lot of subtle ways) then one of these options almost has to be true: they are still subsidizing inference, training costs are so ridiculously high that they need to make huge profits off inference or collapse in on themselves, or they are price fixing.

reply
The training costs are very likely the reason. Dario has talked about how each individual model is profitable, but how the expenditure training the next generation of models makes it look like they're not profitable at any given moment in time, and I believe he's being honest about that.

The market for open weight model hosting gives you an idea of the profitable price floor, it's pretty clear there's markup baked into OAI/Anthropic's APIs.

reply
If true then why are neither Anthropic or OpenAI dropping their API pricing

They are? In the before times of 2025, Opus 4.1 was $75 per million tokens. Opus 4.8 is $25, and Fable is/was $50.

reply
Given my experience with hosting these models at scale, working and optimizing load, I don't think the margins are nearly as high as 75% if the models are as big as people often claim.

Only reason deepseek is so cheap is because well I don't know, but actual pricing should be around their initial price which was 4x, at that price you have a healthy 25-50% margin based on occupancy, given the deepseek v4 is a very sparse moe model.

GLM 5.2 for example doesn't have more than 30-50% margins that's assuming old pricing for GPUs, current inflated GPU pricing well I am certain the margins must be lower. Ofc you can host for cheaper with quantization, and if you have very consistent capacity/utilization, which is not the norm with AI workloads.

Overall for large models like GPT 5.5 or Opus there must be healthier margins of around 50-70% assuming GPU pricing didn't increase for these companies. Even if it did 30-40% margin should be possible, even in worst case assuming all GPU they had saw a jump in pricing.

For smaller models it's hard to say, I would guess 20% but these models might be much smaller than I suspect, then it might be double that.

Note the issue is less intelligent tokens don't linearly scale down in memory usage, which is the biggest pain point of serving models. Context sizes have fucked us all.

Also anyone claiming OAI makes less margins on APIs or stuff might be wrong given they are on much lower context size, 1M context definitely is a lot more expensive to serve especially with smaller models like sonnet.

reply

  > Neither Anthropic nor OpenAI allow Business nor Enterprise customers access to the high value $200/mo plan. 
they may not "allow" it, but i've seen first hand enterprises encourage employees to use these accounts personally and get reimbursed later to avoid pay-as-you-go w/limits pricing for users who do tokenmaxing as a cost control measure...
reply