undefined

upvote

points

by Someone12346 hours ago |

upvote

by simonw6 hours ago|

[-]

The cost per token served has been falling steadily over the past few years across basically all of the providers. OpenAI dropped the price they charged for o3 to 1/5th of what it was in June last year thanks to "engineers optimizing inferencing", and plenty of other providers have found cost savings too.

Turns out there was a lot of low-hanging fruit in terms of inference optimization that hadn't been plucked yet.

> A year or more ago, I read that both Anthropic and OpenAI were losing money on every single request even for their paid subscribers

Where did you hear that? It doesn't match my mental model of how this has played out.

reply

upvote

by cootsnuck6 hours ago|

[-]

I have not see any reporting or evidence at all that Anthropic or OpenAI is able to make money on inference yet.

> Turns out there was a lot of low-hanging fruit in terms of inference optimization that hadn't been plucked yet.

That does not mean the frontier labs are pricing their APIs to cover their costs yet.

It can both be true that it has gotten cheaper for them to provide inference and that they still are subsidizing inference costs.

In fact, I'd argue that's way more likely given that has been precisely the goto strategy for highly-competitive startups for awhile now. Price low to pump adoption and dominate the market, worry about raising prices for financial sustainability later, burn through investor money until then.

What no one outside of these frontier labs knows right now is how big the gap is between current pricing and eventual pricing.

reply

upvote

by chis5 hours ago|

[-]

It's quite clear that these companies do make money on each marginal token. They've said this directly and analysts agree [1]. It's less clear that the margins are high enough to pay off the up-front cost of training each model.

[1] https://epochai.substack.com/p/can-ai-companies-become-profi...

reply

upvote

by m1014 hours ago|

[-]

It’s not clear at all because model training upfront costs and how you depreciate them are big unknowns, even for deprecated models. See my last comment for a bit more detail.

reply

upvote

by simonw1 hours ago|

[-]

They are obviously losing money on training. I think they are selling inference for less than what it costs to serve these tokens.

That really matters. If they are making a margin on inference they could conceivably break even no matter how expensive training is, provided they sign up enough paying customers.

If they lose money on every paying customer then building great products that customers want to pay for them will just make their financial situation worse.

reply

upvote

by ACCount372 hours ago|

[-]

By now, model lifetime inference compute is >10x model training compute, for mainstream models. Further amortized by things like base model reuse.

reply

upvote

by emp173441 hours ago|

[-]

Sue, but if they stop training new models, the current models will be useless in a few years as our knowledge base evolves. They need to continually train new models to have a useful product.

reply

upvote

by magicalist5 hours ago|

[-]

> They've said this directly and analysts agree [1]

chasing down a few sources in that article leads to articles like this at the root of claims[1], which is entirely based on information "according to a person with knowledge of the company’s financials", which doesn't exactly fill me with confidence.

[1] https://www.theinformation.com/articles/openai-getting-effic...

reply

upvote

by simonw1 hours ago|

[-]

"according to a person with knowledge of the company’s financials" is how professional journalists tell you that someone who they judge to be credible has leaked information to them.

I wrote a guide to deciphering that kind of language a couple of years ago: https://simonwillison.net/2023/Nov/22/deciphering-clues/

reply

upvote

by 9cb14c1ec05 hours ago|

[-]

It's also true that their inference costs are being heavily subsidized. For example, if you calculate Oracles debt into OpenAIs revenue, they would be incredibly far underwater on inference.

reply

upvote

by NitpickLawyer6 hours ago|

[-]

> they still are subsidizing inference costs.

They are for sure subsidising costs on all you can prompt packages (20-100-200$ /mo). They do that for data gathering mostly, and at a smaller degree for user retention.

> evidence at all that Anthropic or OpenAI is able to make money on inference yet.

You can infer that from what 3rd party inference providers are charging. The largest open models atm are dsv3 (~650B params) and kimi2.5 (1.2T params). They are being served at 2-2.5-3$ /Mtok. That's sonnet / gpt-mini / gemini3-flash price range. You can make some educates guesses that they get some leeway for model size at the 10-15$/ Mtok prices for their top tier models. So if they are inside some sane model sizes, they are likely making money off of token based APIs.

reply

upvote

by slopusila2 hours ago|

[-]

most of those subscriptions go unused. I barely use 10% of mine

so my unused tokens compensate for the few heavy users

reply

upvote

by mrandish5 hours ago|

[-]

> I have not see any reporting or evidence at all that Anthropic or OpenAI is able to make money on inference yet.

Anthropic planning an IPO this year is a broad meta-indicator that internally they believe they'll be able to reach break-even sometime next year on delivering a competitive model. Of course, their belief could turn out to be wrong but it doesn't make much sense to do an IPO if you don't think you're close. Assuming you have a choice with other options to raise private capital (which still seems true), it would be better to defer an IPO until you expect quarterly numbers to reach break-even or at least close to it.

Despite the willingness of private investment to fund hugely negative AI spend, the recently growing twitchiness of public markets around AI ecosystem stocks indicates they're already worried prices have exceeded near-term value. It doesn't seem like they're in a mood to fund oceans of dotcom-like red ink for long.

reply

upvote

by WarmWash5 hours ago|

[-]

IPO'ing is often what you do to give your golden investors an exit hatch to dump their shares on the notoriously idiotic and hype driven public.

reply

upvote

by barrkel6 hours ago|

[-]

> evidence at all that Anthropic or OpenAI is able to make money on inference yet.

The evidence is in third party inference costs for open source models.

reply

upvote

by replwoacause8 minutes ago|

[-]

My experience trying to use Opus 4.5 on the Pro plan has been terrible. It blows up my usage very very fast. I avoid it altogether now. Yes, I know they warn about this, but it's comically fast how quickly it happens.

reply

upvote

by nubg6 hours ago|

[-]

> "engineers optimizing inferencing"

are we sure this is not a fancy way of saying quantization?

reply

upvote

by simonw1 hours ago|

[-]

The o3 optimizations were not quantization, they confirmed this at the time.

reply

upvote

by bityard4 hours ago|

[-]

When MP3 became popular, people were amazed that you could compress audio to 1/10th its size with minor quality loss. A few decades later, we have audio compression that is much better and higher-quality than MP3, and they took a lot more effort than "MP3 but at a lower bitrate."

The same is happening in AI research now.

reply

upvote

by embedding-shape6 hours ago|

[-]

Or distilled models, or just slightly smaller models but same architecture. Lots of options, all of them conveniently fitting inside "optimizing inferencing".

reply

upvote

by esafak5 hours ago|

[-]

Someone made a quality tracker: https://marginlab.ai/trackers/claude-code/

reply

upvote

by jmalicki6 hours ago|

[-]

A ton of GPU kernels are hugely inefficient. Not saying the numbers are realistic, but look at the 100s of times of gain in the Anthropic performance takehome exam that floated around on here.

And if you've worked with pytorch models a lot, having custom fused kernels can be huge. For instance, look at the kind of gains to be had when FlashAttention came out.

This isn't just quantization, it's actually just better optimization.

Even when it comes to quantization, Blackwell has far better quantization primitives and new floating point types that support row or layer-wise scaling that can quantize with far less quality reduction.

There is also a ton of work in the past year on sub-quadratic attention for new models that gets rid of a huge bottleneck, but like quantization can be a tradeoff, and a lot of progress has been made there on moving the Pareto frontier as well.

It's almost like when you're spending hundreds of billions on capex for GPUs, you can afford to hire engineers to make them perform better without just nerfing the models with more quantization.

reply

upvote

by Der_Einzige6 hours ago|

[-]

"This isn't X, it's Y" with extra steps.

reply

upvote

by jmalicki4 hours ago|

[-]

I'm flattered you think I wrote as well as an AI.

reply

upvote

by nubg3 hours ago|

[-]

lmao

reply

upvote

by sumitkumar5 hours ago|

[-]

It seems it is true for gemini because they have a humongous sparse model but it isn't so true for the max performance opus-4.5/6 and gpt-5.2/3.

reply

upvote

by Aurornis6 hours ago|

[-]

> A year or more ago, I read that both Anthropic and OpenAI were losing money on every single request even for their paid subscribers

This gets repeated everywhere but I don't think it's true.

The company is unprofitable overall, but I don't see any reason to believe that their per-token inference costs are below the marginal cost of computing those tokens.

It is true that the company is unprofitable overall when you account for R&D spend, compensation, training, and everything else. This is a deliberate choice that every heavily funded startup should be making, otherwise you're wasting the investment money. That's precisely what the investment money is for.

However I don't think using their API and paying for tokens has negative value for the company. We can compare to models like DeepSeek where providers can charge a fraction of the price of OpenAI tokens and still be profitable. OpenAI's inference costs are going to be higher, but they're charging such a high premium that it's hard to believe they're losing money on each token sold. I think every token paid for moves them incrementally closer to profitability, not away from it.

reply

upvote

by 38362936485 hours ago|

[-]

The reports I remember show that they're profitable per-model, but overlap R&D so that the company is negative overall. And therefore will turn a massive profit if they stop making new models.

reply

upvote

by schnable3 hours ago|

[-]

* stop making new models and people keep using the existing models, not switch to a competitor still investing in new models.

reply

upvote

by trcf235 hours ago|

[-]

Doesn’t it also depend on averaging with free users?

reply

upvote

by runarberg5 hours ago|

[-]

I can see a case for omitting R&D when talking about profitability, but training makes no sense. Training is what makes the model, omitting it is like omitting the cost of running the production facility of a car manufacturer. If AI companies stop training they will stop producing models, and they will run out of a products to sell.

reply

upvote

by vidarh4 hours ago|

[-]

The reason for this is that the cost scales with the model and training cadence, not usage and so they will hope that they will be able to scale number of inference tokens sold both by increasing use and/or slowing the training cadence as competitors are also forced to aim for overall profitability.

It is essentially a big game of venture capital chicken at present.

reply

upvote

by Aurornis4 hours ago|

[-]

It depends on what you're talking about

If you're looking at overall profitability, you include everything

If you're talking about unit economics of producing tokens, you only include the marginal cost of each token against the marginal revenue of selling that token

reply

upvote

by runarberg3 hours ago|

[-]

I don’t understand the logic. Without training the marginal cost of each token goes into nothing. The more you train, the better the model, and (presumably) you will gain more costumer interest. Unlike R&D you will always have to train new models if you want to keep your customers.

To me this looks likes some creative bookkeeping, or even wishful thinking. It is like if SpaceX omits the price of the satellites when calculating their profits.

reply

upvote

by nodja4 hours ago|

[-]

> A year or more ago, I read that both Anthropic and OpenAI were losing money on every single request even for their paid subscribers, and I don't know if that has changed with more efficient hardware/software improvements/caching.

This is obviously not true, you can use real data and common sense.

Just look up a similar sized open weights model on openrouter and compare the prices. You'll note the similar sized model is often much cheaper than what anthropic/openai provide.

Example: Let's compare claude 4 models with deepseek. Claude 4 is ~400B params so it's best to compare with something like deepseek V3 which is 680B params.

Even if we compare the cheapest claude model to the most expensive deepseek provider we have claude charging $1/M for input and $5/M for output, while deepseek providers charge $0.4/M and $1.2/M, a fifth of the price, you can get it as cheap as $.27 input $0.4 output.

As you can see, even if we skew things overly in favor of claude, the story is clear, claude token prices are much higher than they could've been. The difference in prices is because anthropic also needs to pay for training costs, while openrouter providers just need to worry on making serving models profitable. Deepseek is also not as capable as claude which also puts down pressure on the prices.

There's still a chance that anthropic/openai models are losing money on inference, if for example they're somehow much larger than expected, the 400B param number is not official, just speculative from how it performs, this is only taking into account API prices, subscriptions and free user will of course skew the real profitability numbers, etc.

Price sources:

https://openrouter.ai/deepseek/deepseek-v3.2-speciale

https://claude.com/pricing#api

reply

upvote

by Someone12344 hours ago|

[-]

> This is obviously not true, you can use real data and common sense.

It isn't "common sense" at all. You're comparing several companies losing money, to one another, and suggesting that they're obviously making money because one is under-cutting another more aggressively.

LLM/AI ventures are all currently under-water with massive VC or similar money flowing in, they also all need training data from users, so it is very reasonable to speculate that they're in loss-leader mode.

reply

upvote

by nodja3 hours ago|

[-]

Doing some math in my head, buying the GPUs at retail price, it would take probably around half a year to make the money back, probably more depending how expensive electricity is in the area you're serving from. So I don't know where this "losing money" rhetoric is coming from. It's probably harder to source the actual GPUs than making money off them.

reply

upvote

by suddenlybananas1 hours ago|

[-]

electricity

reply

upvote

by Havoc6 hours ago|

[-]

Saw a comment earlier today about google seeing a big (50%+) fall in Gemini serving cost per unit across 2025 but can’t find it now. Was either here or on Reddit

reply

upvote

by mattddowney6 hours ago|

[-]

From Alphabet 2025 Q4 Earnings call: "As we scale, we’re getting dramatically more efficient. We were able to lower Gemini serving unit costs by 78% over 2025 through model optimizations, efficiency and utilization improvements." https://abc.xyz/investor/events/event-details/2026/2025-Q4-E...

reply

upvote

by Havoc3 hours ago|

[-]

Thanks! That's the one

reply

upvote

by m1014 hours ago|

[-]

I think actually working out whether they are losing money is extremely difficult for current models but you can look backwards. The big uncertainties are:

1) how do you depreciate a new model? What is its useful life? (Only know this once you deprecate it)

2) how do you depreciate your hardware over the period you trained this model? Another big unknown and not known until you finally write the hardware off.

The easy thing to calculate is whether you are making money actually serving the model. And the answer is almost certainly yes they are making money from this perspective, but that’s missing a large part of the cost and is therefore wrong.

reply

upvote

by 3abiton6 hours ago|

[-]

It's not just that. Everyone is complacent with the utilization of AI agents. I have been using AI for coding for quite a while, and most of my "wasted" time is correcting its trajectory and guiding it through the thinking process. It's very fast iterations but it can easily go off track. Claude's family are pretty good at doing chained task, but still once the task becomes too big context wise, it's impossible to get back on track. Cost wise, it's cheaper than hiring skilled people, that's for sure.

reply

upvote

by lufenialif26 hours ago|

[-]

Cost wise, doesn’t that depend on what you could be doing besides steering agents?

reply

upvote

by cyanydeez5 hours ago|

[-]

Isn't the quote something like: "If these LLMs are so good at producing products, where are all those products?"

reply

upvote

by zozbot2346 hours ago|

[-]

> i.e. plans/API calls that make this practical at scale are expensive

Local AI's make agent workflows a whole lot more practical. Making the initial investment for a good homelab/on-prem facility will effectively become a no-brainer given the advantages on privacy and reliability, and you don't have to fear rugpulls or VC's playing the "lose money on every request" game since you know exactly how much you're paying in power costs for your overall load.

reply

upvote

by vbezhenar5 hours ago|

[-]

I don't care about privacy and I didn't have much problems with reliability of AI companies. Spending ridiculous amount of money on hardware that's going to be obsolete in a few years and won't be utilized at 100% during that time is not something that many people would do, IMO. Privacy is good when it's given for free.

I would rather spend money on some pseudo-local inference (when cloud company manages everything for me and I just can specify some open source model and pay for GPU usage).

reply

upvote

by slopusila2 hours ago|

[-]

on prem economics dont work because you can't batch requests. unless you are able to run 100 agents at the same time all the time

reply

upvote

by KaiserPro5 hours ago|

[-]

Gemini-pro-preview is on ollama and requires h100 which is ~$15-30k. Google are charging $3 a million tokens. Supposedly its capable of generating between 1 and 12 million tokens an hour.

Which is profitable. but not by much.

reply

upvote

by grim_io3 hours ago|

[-]

What do you mean it's on ollama and requires h100? As a proprietary google model, it runs on their own hardware, not nvidia.

reply

upvote

by KaiserPro2 hours ago|

[-]

sorry A lack of context:

https://ollama.com/library/gemini-3-pro-preview

You can run it on your own infra. Anthropic and openAI are running off nvidia, so are meta(well supposedly they had custom silicon, I'm not sure if its capable of running big models) and mistral.

however if google really are running their own inference hardware, then that means the cost is different (developing silicon is not cheap...) as you say.

reply

upvote

by simonw1 hours ago|

[-]

You can't run Gemini 3 Pro Preview on your own infrastructure. Ollama sell access to cloud models these days. It's a little weird and confusing.

reply

upvote

by zozbot2342 hours ago|

[-]

That's a cloud-linked model. It's about using ollama as an API client (for ease of compatibility with other uses, including local), not running that model on local infra. Google does release open models (called Gemma) but they're not nearly as capable.

reply

upvote

by Bombthecat6 hours ago|

[-]

That's why anthropic switched to tpu, you can sell at cost.

reply

upvote

by WarmWash5 hours ago|

[-]

These are intro prices.

This is all straight out of the playbook. Get everyone hooked on your product by being cheap and generous.

Raise the price to backpay what you gave away plus cover current expenses and profits.

In no way shape or form should people think these $20/mo plans are going to be the norm. From OpenAI's marketing plan, and a general 5-10 year ROI horizon for AI investment, we should expect AI use to cost $60-80/mo per user.

reply

upvote

by esafak46 minutes ago|

[-]

The models in 5-10 years are going to be unimaginably good. $100/month will be a bargain for knowledge workers, if they survive.

reply