upvote
Aren't they only using the SRAM for the KV cache? They mention that the hardwired weights have a very high density. They say about the ROM part:

> We have got this scheme for the mask ROM recall fabric – the hard-wired part – where we can store four bits away and do the multiply related to it – everything – with a single transistor. So the density is basically insane.

I'm not a hardware guy but they seem to be making a strong distinction between the techniques they're using for the weights vs KV cache

> In the current generation, our density is 8 billion parameters on the hard wired part of the chip., plus the SRAM to allow us to do KV caches, adaptations like fine tuning, and etc.

reply