undefined

upvote

points

by pjs_4 hours ago |

upvote

by onlyrealcuzzo4 hours ago|

[-]

Nvidia seems cooked.

Google is crushing them on inference. By TPUv9, they could be 4x more energy efficient and cheaper overall (even if Nvidia cuts their margins from 75% to 40%).

Cerebras will be substantially better for agentic workflows in terms of speed.

And if you don't care as much about speed and only cost and energy, Google will still crush Nvidia.

And Nvidia won't be cheaper for training new models either. The vast majority of chips will be used for inference by 2028 instead of training anyway.

Nvidia has no manufacturing reliability story. Anyone can buy TSMC's output.

Power is the bottleneck in the US (and everywhere besides China). By TPUv9 - Google is projected to be 4x more energy efficient. It's a no-brainer who you're going with starting with TPUv8 when Google lets you run on-prem.

These are GW scale data centers. You can't just build 4 large-scale nuclear power plants in a year in the US (or anywhere, even China). You can't just build 4 GW solar farms in a year in the US to power your less efficient data center. Maybe you could in China (if the economics were on your side, but they aren't). You sure as hell can't do it anywhere else (maybe India).

What am I missing? I don't understand how Nvidia could've been so far ahead and just let every part of the market slip away.

reply

upvote

by sailingparrot3 hours ago|

[-]

> let every part of the market slip away.

Which part of the market has slept away, exactly ? Everything you wrote is supposition and extrapolation. Nvidia has a chokehold on the entire market. All other players still exist in the small pockets that Nvidia doesn’t have enough production capacity to serve. And their dev ecosystem is still so far ahead of anyone else. Which providers gets chosen to equip a 100k chips data center goes so far beyond the raw chip power.

reply

upvote

by onlyrealcuzzo3 hours ago|

[-]

> Nvidia has a chokehold on the entire market.

You're obviously not looking at expected forward orders for 2026 and 2027.

reply

upvote

by louiereederson1 hours ago|

[-]

I think most estimates have Nvidia at more or less stable share of CoWoS capacity (around 60%), which is ~doubling in '26.

reply

upvote

by mnicky3 hours ago|

[-]

> What am I missing?

Largest production capacity maybe?

Also, market demand will be so high that every player's chips will be sold out.

reply

upvote

by onlyrealcuzzo3 hours ago|

[-]

> Largest production capacity maybe?

Anyone can buy TSMC's output...

reply

upvote

by CamperBob245 minutes ago|

[-]

Which I'm sure is 100% reserved through at least 2030.

reply

upvote

by Keyframe2 hours ago|

[-]

Can anyone buy TSMC though?

reply

upvote

by louiereederson1 hours ago|

[-]

No. TSMC will not take the risk on allocating capacity to just anyone given the opportunity cost.

reply

upvote

by wing-_-nuts3 hours ago|

[-]

Man I hope someone drinks Nvidia's milk shake. They need to get humbled back to the point where they're desperate to sell gpus to consumers again.

Only major road block is cuda...

reply

upvote

by whism3 hours ago|

[-]

I believe they licensed smth from groq

reply

upvote

by Handy-Man3 hours ago|

[-]

Well they `acquired` groq for a reason.

reply

upvote

by zozbot2344 hours ago|

[-]

It's "dinner-plate sized" because it's just a full silicon wafer. It's nice to see that wafer-scale integration is now being used for real work but it's been researched for decades.

reply

upvote

by arcanemachiner4 hours ago|

[-]

Just wish they weren't so insanely expensive...

reply

upvote

by azinman24 hours ago|

[-]

The bigger the chip, the worse the yield.

reply

upvote

by speedgoose3 hours ago|

[-]

I suggest to read their website, they explain pretty well how they manage good yield. Though I’m not an expert in this field. I does make sense and I would be surprised if they were caught lying.

reply

upvote

by moralestapia4 hours ago|

[-]

This comment doesn't make sense.

reply

upvote

by Sohcahtoa823 hours ago|

[-]

One wafer will turn into multiple chips.

Defects are best measured on a per-wafer basis, not per-chip. So if if your chips are huge and you can only put 4 chips on a wafer, 1 defect can cut your yield by 25%. If they're smaller and you fit 100 chips on a wafer, then 1 defect on the wafer is only cutting yield by 1%. Of course, there's more to this when you start reading about "binning", fusing off cores, etc.

There's plenty of information out there about how CPU manufacturing works, why defects happen, and how they're handled. Suffice to say, the comment makes perfect sense.

reply

upvote

by snovv_crash2 hours ago|

[-]

That's why you typically fuse off defective sub-units and just have a slightly slower chip. GPU and CPU manufacturers have done this for at least 15 years now, that I'm aware of.

reply

upvote

by azinman24 hours ago|

[-]

Sure it does. If it’s many small dies on a wafer, then imperfections don’t ruin the entire batch; you just bin those components. If the entire wafer is a single die, you have much less tolerance for errors.

reply

upvote

by dekhn3 hours ago|

[-]

Although, IIUC, Cerebras expects some amount of imperfection and can adjust the hardware (or maybe the software) to avoid those components after they're detected. https://www.cerebras.ai/blog/100x-defect-tolerance-how-cereb...

reply

upvote

by pertymcpert3 hours ago|

[-]

You can just do dynamic binning.

reply

upvote

by louiereederson1 hours ago|

[-]

You say this with such confidence and then ask if smaller chips require smaller wafers.

reply

upvote

by DocJade4 hours ago|

[-]

Bigger chip = more surface area = higher chance for somewhere in the chip to have a manufacturing defect

Yields on silicon are great, but not perfect

reply

upvote

by moralestapia3 hours ago|

[-]

Does that mean smaller chips are made from smaller wafers?

reply

upvote

by Sohcahtoa821 hours ago|

[-]

Nope. They use the same size wafers and then just put more chips on a wafer.

reply

upvote

by moralestapia52 minutes ago|

[-]

So, does a wafer with a huge chip has more defects per area than a wafer with 100s of small chips?

reply

upvote

by dgfl33 minutes ago|

[-]

There’s an expected amount of defects per wafer. If a chip has a defect, then it is lost (simplification). A wafer with 100 chips may lose 10 to defects, giving a yield of 90%. The same wafer but with 1000 smaller chips would still have lost only 10 of them, giving 99% yield.

reply

upvote

by dalemhurley3 hours ago|

[-]

Yet investors keep backing NVIDIA.

reply

upvote

by vimda2 hours ago|

[-]

At this point Tech investment and analysis is so divorced from any kind of reality that it's more akin to lemmings on the cliff than careful analysis of fundamentals

reply

upvote

by latchkey4 hours ago|

[-]

Not for what they are using it for. It is $1m+/chip and they can fit 1 of them in a rack. Rack space in DC's is a premium asset. The density isn't there. AI models need tons of memory (this product annoucement is case in point) and they don't have it, nor do they have a way to get it since they are last in line at the fabs.

Their only chance is an aquihire, but nvidia just spent $20b on groq instead. Dead man walking.

reply

upvote

by boredatoms21 minutes ago|

[-]

Power/cooling is the premium.

Can always build a bigger hall

reply

upvote

by p1esk4 hours ago|

[-]

The real question is what’s their perf/dollar vs nvidia?

reply

upvote

by zozbot2344 hours ago|

[-]

I guess it depends what you mean by "perf". If you optimize everything for the absolutely lowest latency given your power budget, your throughput is going to suck - and vice versa. Throughput is ultimately what matters when everything about AI is so clearly power-constrained, latency is a distraction. So TPU-like custom chips are likely the better choice.

reply

upvote

by p1esk4 hours ago|

[-]

By perf I mean how much does it cost to serve 1T model to 1M users at 50 tokens/sec.

reply

upvote

by zozbot2343 hours ago|

[-]

All 1T models are not equal. E.g. how many active parameters? what's the native quantization? how long is the max context? Also, it's quite likely that some smaller models in common use are even sub-1T. If your model is light enough, the lower throughput doesn't necessarily hurt you all that much and you can enjoy the lightning-fast speed.

reply

upvote

by p1esk3 hours ago|

[-]

Just pick some reasonable values. Also, keep in mind that this hardware must still be useful 3 years from now. What’s going to happen to cerebras in 3 years? What about nvidia? Which one is a safer bet?

On the other hand, competition is good - nvidia can’t have the whole pie forever.

reply

upvote

by zozbot2342 hours ago|

[-]

> Just pick some reasonable values.

And that's the point - what's "reasonable" depends on the hardware and is far from fixed. Some users here are saying that this model is "blazing fast" but a bit weaker than expected, and one might've guessed as much.

> On the other hand, competition is good - nvidia can’t have the whole pie forever.

Sure, but arguably the closest thing to competition for nVidia is TPUs and future custom ASICs that will likely save a lot on energy used per model inference, while not focusing all that much on being super fast.

reply

upvote

by latchkey2 hours ago|

[-]

AMD

reply

upvote

by wiredpancake1 hours ago|

[-]

[dead]

reply

upvote

by fragmede3 hours ago|

[-]

> Throughput is ultimately what matters

I disagree. Yes it does matter, but because the popular interface is via chat, streaming the results of inference feels better to the squishy messy gross human operating the chat, even if it ends up taking longer. You can give all the benchmark results you want, humans aren't robots. They aren't data driven, they have feelings, and they're going to go with what feels better. That isn't true for all uses, but time to first byte is ridiculously important for human-computer interaction.

reply

upvote

by zozbot2343 hours ago|

[-]

You just have to change the "popular interface" to something else. Chat is OK for trivia or genuinely time-sensitive questions, everything else goes through via email or some sort of webmail-like interface where requests are submitted and replies come back asynchronously. (This is already how batch APIs work, but they only offer a 50% discount compared to interactive, which is not enough to really make a good case for them - especially not for agentic workloads.)

reply

upvote

by xnx4 hours ago|

[-]

Or Google TPUs.

reply

upvote

by latchkey4 hours ago|

[-]

TPUs don't have enough memory either, but they have really great interconnects, so they can build a nice high density cluster.

Compare the photos of a Cerebras deployment to a TPU deployment.

https://www.nextplatform.com/wp-content/uploads/2023/07/cere...

https://assets.bwbx.io/images/users/iqjWHBFdfxIU/iOLs2FEQxQv...

The difference is striking.

reply

upvote

by p1esk3 hours ago|

[-]

Oh wow the cabling in the first link is really sloppy!

reply

upvote

by latchkey4 hours ago|

[-]

Exactly. They won't ever tell you. It is never published.

Let's not forget that the CEO is an SEC felon who got caught trying to pull a fast one.

reply

upvote

by spwa44 hours ago|

[-]

Oh don't worry. Ever since the power issue started developing rack space is no longer at a premium. Or at least, it's no longer the limiting factor. Power is.

reply

upvote

by latchkey4 hours ago|

[-]

The dirty secret is that there is plenty of power. But, it isn't all in one place and it is often stranded in DC's that can't do the density needed for AI compute.

Training models needs everything in one DC, inference doesn't.

reply

upvote

by femiagbabiaka4 hours ago|

[-]

yep

reply

upvote

by xnx4 hours ago|

[-]

Cerebras is a bit of a stunt like "datacenters in spaaaaace".

Terrible yield: one defect can ruin a whole wafer instead of just a chip region. Poor perf./cost (see above). Difficult to program. Little space for RAM.

reply

upvote

by the_duke4 hours ago|

[-]

They claim the opposite, though, saying the chip is designed to tolerate many defects and work around them.

reply

upvote

by 4 hours ago|

[-]

deleted

reply

upvote

by 4 hours ago|

[-]

deleted

reply