upvote
I'm generally with you on all of these ideas.

However, Google probably won't catch up. Nvidia has been winning in spite of the fact that their hardware is general purpose rather than tuned for inference.

Rubin has architectural differences I don't understand that are supposed to make inference much cheaper and faster while still retaining those other more generic capabilities. Their next generation after that is going to do even better at being fast for inference and general purpose.

Google is betting that their TPUs won't depreciate faster than the markup they have to pay to Nvidia. I don't think they will be right.

reply
Why do people who don't follow the prices of A100 talk like they know things about GPU pricing dynamics?

A100s are ~7 years old and going for more than 2 dollars an hour, significantly more expensive than even 2 years ago. This is because anything with 80gb of VRAM or more and made by Nvidia will have economically useful lifespans of like, 10 years.

I could see H100s getting 12 years.

Micheal Berry doesn't know shit about GPUs.

reply
So I was curious about how A100s would do running DeepSeek v4. I can't find any instances of running v4 Pro on even an 8xA100 cluster. So you need to run Flash at ~284B params. A100s don't support FP8 so you're running FP16 so you're taking a hit that way. But I see estimates of 30-50tok/s for an 8xA100 cluster. They're drawing 300-400W each so you're looking at probably 3500+ Watts, which is roughly 0.01tok/W.

Now jump ahead 2 years and you seem to have a massive jump in performance [1]. The tokens/Watt goes up by at least 2 orders of magnitude. And the B100 is 3-4x that. And we're about to hit the R100 (Rubin) cliff.

That's what this is going to come down. When hyperscalar DCs are getting to Gigawatt power usage, it all comes down to power efficiency. Those A100s aren't far from being sold for scrap.

I've been looking into how different companies are handling depreciation for this. Amazon seems to be saying the life is 3-4 years, Google 4-5 and Meta is saying 8+, which I think is wildly optimistic.

[1]: https://lambda.ai/inference-models/deepseek-ai/deepseek-v4-f...

reply
You're focussing on inference ... is it not more likely that A100's are being used for training/fine tuning?
reply