The input cache hit tokens are incredibly cheap for them, (incredibly high margin too, except for deepseek).
And input tokens are in the middle. Input tokens can be processed very efficiently.
Also his math is wrong. $100k gets you 22.7B output tokens at $4.4/M which is how much GLM 5.2 costs.
At 500/s 22.7B is just 500 days. Or about 1.54 years. Which is much less then the life of the hardware.
concurrency