upvote
Using the space of an entire wafer for one chip would result in extremely low manufacturing yields. Even with state of the art silicon cleanrooms, there will still be defects in parts of the output.

With CPUs and GPUs, chip makers can disable faulty cores and bin them as lower SKUs to get some yield out of it. But if you're using an entire wafer to embed weights, and a speck of dust causes a printing defect that makes the weights wrong, the entire wafer is worthless.

reply
Do failed wafers have to go in the trash, or can you recycle them?
reply
What's the difference between disabling faulty cores and disabling the parts of the wafer that have defects?
reply
I'm not an expert, but I think those are the same thing. But for an LLM etched onto a whole wafer, it doesn't make sense to disable part of it since that would remove some weights entirely.
reply
Is that defect easy to detect?
reply
We do. The Cerebras line of Wafer Scale Engines is exactly an entire wafer of cores running in parallel with fast memory next to each one. It's intended for very high throughput LLM inference. https://www.cerebras.ai/chip
reply