upvote
Pretty much every major American inference provider claims to make a profit on API-based inference. Consumer plans might be subsidized overall, but it's hard to say since they're a black box and some consumers don't fully use their plans
reply
All of them. It's simply impossible to sell tokens by usage at a loss now. You'll be arbitraged to death in a few days. It only makes sense to subsidize cost if you're selling a subscription.
reply
Third parties selling open-weight inference on OpenRouter are surely selling on a profit. Zero reason to subsidize it.
reply
Selling inference is not fundamentally different from selling compute - you amortize the lifetime cost of owning and operating the GPUs and then turn that into a per-token price. The risk of loss would be if there is low demand (and thus your facilities run underutilized), but I doubt inference providers are suffering from this.

Where the long-term payoff still seems speculative, is for companies doing training rather than just inference.

reply
There’s a lot of debate over what the useful lifespan of the hardware is though. A number that seems very vibes based determines if these datacenters are a good investment or disastrous.
reply
I specifically remember this debate coming up when the H100 was the only player on the table and AMD came out with a card that was almost as fast in at least benchmarks but like half the cost. I haven't seen a follow up with real world use though and as a home labber I know that in the last three weeks the support for AMD stuff at least has gotten impressively useful covering even cuda if you enjoy pain and suffering.

What I'm curious about are what about the other stuff out there such as the ARM and tensor chips.

reply
Google definitely makes money in other areas. Do they make money on inference?
reply
deleted
reply