Google is crushing them on inference. By TPUv9, they could be 4x more energy efficient and cheaper overall (even if Nvidia cuts their margins from 75% to 40%).
Cerebras will be substantially better for agentic workflows in terms of speed.
And if you don't care as much about speed and only cost and energy, Google will still crush Nvidia.
And Nvidia won't be cheaper for training new models either. The vast majority of chips will be used for inference by 2028 instead of training anyway.
Nvidia has no manufacturing reliability story. Anyone can buy TSMC's output.
Power is the bottleneck in the US (and everywhere besides China). By TPUv9 - Google is projected to be 4x more energy efficient. It's a no-brainer who you're going with starting with TPUv8 when Google lets you run on-prem.
These are GW scale data centers. You can't just build 4 large-scale nuclear power plants in a year in the US (or anywhere, even China). You can't just build 4 GW solar farms in a year in the US to power your less efficient data center. Maybe you could in China (if the economics were on your side, but they aren't). You sure as hell can't do it anywhere else (maybe India).
What am I missing? I don't understand how Nvidia could've been so far ahead and just let every part of the market slip away.
Which part of the market has slept away, exactly ? Everything you wrote is supposition and extrapolation. Nvidia has a chokehold on the entire market. All other players still exist in the small pockets that Nvidia doesn’t have enough production capacity to serve. And their dev ecosystem is still so far ahead of anyone else. Which providers gets chosen to equip a 100k chips data center goes so far beyond the raw chip power.
You're obviously not looking at expected forward orders for 2026 and 2027.
Largest production capacity maybe?
Also, market demand will be so high that every player's chips will be sold out.
Anyone can buy TSMC's output...
Only major road block is cuda...
Defects are best measured on a per-wafer basis, not per-chip. So if if your chips are huge and you can only put 4 chips on a wafer, 1 defect can cut your yield by 25%. If they're smaller and you fit 100 chips on a wafer, then 1 defect on the wafer is only cutting yield by 1%. Of course, there's more to this when you start reading about "binning", fusing off cores, etc.
There's plenty of information out there about how CPU manufacturing works, why defects happen, and how they're handled. Suffice to say, the comment makes perfect sense.
Yields on silicon are great, but not perfect
Their only chance is an aquihire, but nvidia just spent $20b on groq instead. Dead man walking.
Can always build a bigger hall
On the other hand, competition is good - nvidia can’t have the whole pie forever.
And that's the point - what's "reasonable" depends on the hardware and is far from fixed. Some users here are saying that this model is "blazing fast" but a bit weaker than expected, and one might've guessed as much.
> On the other hand, competition is good - nvidia can’t have the whole pie forever.
Sure, but arguably the closest thing to competition for nVidia is TPUs and future custom ASICs that will likely save a lot on energy used per model inference, while not focusing all that much on being super fast.
I disagree. Yes it does matter, but because the popular interface is via chat, streaming the results of inference feels better to the squishy messy gross human operating the chat, even if it ends up taking longer. You can give all the benchmark results you want, humans aren't robots. They aren't data driven, they have feelings, and they're going to go with what feels better. That isn't true for all uses, but time to first byte is ridiculously important for human-computer interaction.
Compare the photos of a Cerebras deployment to a TPU deployment.
https://www.nextplatform.com/wp-content/uploads/2023/07/cere...
https://assets.bwbx.io/images/users/iqjWHBFdfxIU/iOLs2FEQxQv...
The difference is striking.
Let's not forget that the CEO is an SEC felon who got caught trying to pull a fast one.
Training models needs everything in one DC, inference doesn't.
Terrible yield: one defect can ruin a whole wafer instead of just a chip region. Poor perf./cost (see above). Difficult to program. Little space for RAM.