> We have got this scheme for the mask ROM recall fabric – the hard-wired part – where we can store four bits away and do the multiply related to it – everything – with a single transistor. So the density is basically insane.
I'm not a hardware guy but they seem to be making a strong distinction between the techniques they're using for the weights vs KV cache
> In the current generation, our density is 8 billion parameters on the hard wired part of the chip., plus the SRAM to allow us to do KV caches, adaptations like fine tuning, and etc.