However, Google probably won't catch up. Nvidia has been winning in spite of the fact that their hardware is general purpose rather than tuned for inference.
Rubin has architectural differences I don't understand that are supposed to make inference much cheaper and faster while still retaining those other more generic capabilities. Their next generation after that is going to do even better at being fast for inference and general purpose.
Google is betting that their TPUs won't depreciate faster than the markup they have to pay to Nvidia. I don't think they will be right.
A100s are ~7 years old and going for more than 2 dollars an hour, significantly more expensive than even 2 years ago. This is because anything with 80gb of VRAM or more and made by Nvidia will have economically useful lifespans of like, 10 years.
I could see H100s getting 12 years.
Micheal Berry doesn't know shit about GPUs.
Now jump ahead 2 years and you seem to have a massive jump in performance [1]. The tokens/Watt goes up by at least 2 orders of magnitude. And the B100 is 3-4x that. And we're about to hit the R100 (Rubin) cliff.
That's what this is going to come down. When hyperscalar DCs are getting to Gigawatt power usage, it all comes down to power efficiency. Those A100s aren't far from being sold for scrap.
I've been looking into how different companies are handling depreciation for this. Amazon seems to be saying the life is 3-4 years, Google 4-5 and Meta is saying 8+, which I think is wildly optimistic.
[1]: https://lambda.ai/inference-models/deepseek-ai/deepseek-v4-f...