You are going off vibes alone, this is easily verified, please go verify.
What makes you think they have zero reason to subsidize, because the providers aren't a household names you assume they wouldn't operate at a loss? Whats your logic here? You make no sense.
Also, a lot of money is being made on input tokens and cached tokens, which are much cheaper to compute.
DeepSeek published their math for serving the V3/R1 models. They were 535% profitable: https://github.com/deepseek-ai/open-infra-index/blob/main/20...
If Anthropic and OpenAI are subsidizing the metered API usage, their model is going to end up just as successful as MoviePass. They are burning enough money on the training costs already.
If you have a machine running at 150 tok/ps you can only make $5820 a month at $15 per 1mm running 24/7. It costs a hell of a lot more than 6k a month to run Claude 4.7 @ 150 tok/ps on that machine 24/7.
This math is a bit off, because you have input tokens too, but regardless its still not profitable especially for how long it takes to turn around a request and the caching is probably not all that profitable.
Serving models on dedicated hardware is not the same as your at home 150t/s thing. Inference is measured in thousands of tokens / s in aggregate (i.e. for all the sessions in parallel). That's how they make money.
If you have a machine running at 150 tok/ps you can only make $5820 a month at $15 per 1mm running 24/7. It costs a hell of a lot more than 6k a month to run Claude 4.7 @ 150 tok/ps on that machine 24/7.
This math is a bit off, because you have input tokens too, but regardless its still not profitable especially for how long it takes to turn around a request and the caching is probably not all that profitable.
The reason it works: each time you read the model (memory bound) to calculate the next token, you can also update multiple requests (compute bound) while at it. It's also much more energy-efficient per token.
The idea that everyone is spinning up a $2 million in GPUs to scan their email inbox, search the web or avoid learning something is still ridiculous to me regardless.