(Still compelling!)
To be fair, 2.5kW does sound too much for a single 3x3cm chip, it would probably melt.
Yeah, though I suppose once we get properly 3d silicon I would not be surprised at power rating for that, 3cm^3 would be something to behold.
With these speeds you can run it over USB2, though maybe power is limiting.
But sure, the next generation could be much smaller. It doesn't require battery cells, (much) heat management, or ruggedization, all of which put hard limits on how much you can miniaturise power banks.
But as you said, the next generations are very likely to shrink (especially with them saying they want to do top of the line models in 2 generations), and with architecture improvements it could probably get much smaller.
Nowadays, your average cellphone has more computing power than those behemoths.
I have a micro SD card with 256GB capacity, and I think they are up to 2TB. On a device the size of a fingernail.
The form factor should be anything but thumbdrive.
I haven't had my coffee yet. ;)
Infact, I was thinking, if robots of future could have such slots, where they can use different models, depending on the task they're given. Like a Hardware MoE.
Is this accurate? I don't know enough about hardware, but perhaps someone could clarify: how hard would it be to reverse engineer this to "leak" the model weights? Is it even possible?
There are some labs that sell access to their models (mistral, cohere, etc) without having their models open. I could see a world where more companies can do this if this turns out to be a viable way. Even to end customers, if reverse engineering is deemed impossible. You could have a device that does most of the inference locally and only "call home" when stumped (think alexa with local processing for intent detection and cloud processing for the rest, but better).
I doubt anyone would have the skills, wallet, and tools to RE one of these and extract model weights to run them on other hardware. Maybe state actors like the Chinese government or similar could pull that off.
I doubt it would scale linearly, but for home use 170 tokens/s at 2.5W would be cool; 17 tokens/s at 0,25W would be awesome.
On the other hand, this may be a step towards positronic brains (https://en.wikipedia.org/wiki/Positronic_brain)