undefined

points

[-]

Pretty much every major American inference provider claims to make a profit on API-based inference. Consumer plans might be subsidized overall, but it's hard to say since they're a black box and some consumers don't fully use their plans

by raincole30 minutes ago|

prev|

[-]

All of them. It's simply impossible to sell tokens by usage at a loss now. You'll be arbitraged to death in a few days. It only makes sense to subsidize cost if you're selling a subscription.

by henry20231 hours ago|

prev|

[-]

Third parties selling open-weight inference on OpenRouter are surely selling on a profit. Zero reason to subsidize it.

by wavemode2 hours ago|

prev|

[-]

Selling inference is not fundamentally different from selling compute - you amortize the lifetime cost of owning and operating the GPUs and then turn that into a per-token price. The risk of loss would be if there is low demand (and thus your facilities run underutilized), but I doubt inference providers are suffering from this.

Where the long-term payoff still seems speculative, is for companies doing training rather than just inference.

by Gigachad2 hours ago|

parent|

[-]

There’s a lot of debate over what the useful lifespan of the hardware is though. A number that seems very vibes based determines if these datacenters are a good investment or disastrous.

by hypercube331 hours ago|

parent|

[-]

I specifically remember this debate coming up when the H100 was the only player on the table and AMD came out with a card that was almost as fast in at least benchmarks but like half the cost. I haven't seen a follow up with real world use though and as a home labber I know that in the last three weeks the support for AMD stuff at least has gotten impressively useful covering even cuda if you enjoy pain and suffering.

What I'm curious about are what about the other stuff out there such as the ARM and tensor chips.

by jagged-chisel3 hours ago|

prev|

[-]

Google definitely makes money in other areas. Do they make money on inference?

by 3 hours ago|

prev|

[-]

deleted