undefined

points

by Ldorigo19 hours ago |

comments

by kingstnap13 hours ago|

[-]

Output tokens are actually kinda expensive for the provider.

The input cache hit tokens are incredibly cheap for them, (incredibly high margin too, except for deepseek).

And input tokens are in the middle. Input tokens can be processed very efficiently.

Also his math is wrong. $100k gets you 22.7B output tokens at $4.4/M which is how much GLM 5.2 costs.

At 500/s 22.7B is just 500 days. Or about 1.54 years. Which is much less then the life of the hardware.

by bandrami8 hours ago|

prev|

[-]

Inference providers have been getting a firehose of investor cash to keep the chips running (and are looking around very nervously as that firehose starts to sputter).

by ac2917 hours ago|

prev|

[-]

The inference providers are running batch sizes much larger than 10

by dakolli14 hours ago|

prev|

[-]

https://aimultiple.com/gpu-benchmark

concurrency