If you have a machine running at 150 tok/ps you can only make $5820 a month at $15 per 1mm running 24/7. It costs a hell of a lot more than 6k a month to run Claude 4.7 @ 150 tok/ps on that machine 24/7.
This math is a bit off, because you have input tokens too, but regardless its still not profitable especially for how long it takes to turn around a request and the caching is probably not all that profitable.
The reason it works: each time you read the model (memory bound) to calculate the next token, you can also update multiple requests (compute bound) while at it. It's also much more energy-efficient per token.
The idea that everyone is spinning up a $2 million in GPUs to scan their email inbox, search the web or avoid learning something is still ridiculous to me regardless.