upvote
It reminds me of the switch from GPUs to ASICs in bitcoin mining. I've been expecting this to happen.
reply
But the BTC mining algorithm has not and will not change. That’s the only reason ASICs atleast make a bit of sense for crypto.

AI being static weights is already challenged with the frequent model updates we already see - but may even be a relic once we find a new architecture.

reply
We can expect the model landscape to consolidate some day. Progress will become slower, innovations will become smaller. Not tomorrow, not next year, but the time will come.

And then it'll increasingly make sense to build such a chip into laptops, smartphones, wearables. Not for high-end tasks, but to drive the everyday bread-and-butter tasks.

reply
The world continues to evolve, in a way that requires flexibility - not more constraints. I just fail to see a future where we want less general purpose computers, and more hard-wired ones? Would be interesting to be proven wrong though!
reply
Sounds to me like there’s potential to use these for established models to provide cost/scale advantage while frontier models will run in the existing setup.
reply
IME llama et all require LoRA or fine-tuning to be usable. That's their real value vs closed source massive models, and their small size makes this possible, appealing, and doable on a recurring basis as things evolve. Again, rendering ASICs useless.
reply
Read the blog post. It mentions that their chip has a small SRAM which can store LoRA.
reply
Neither the blog nor Taalas' original post specify what speed to expect when using the SRAM in conjunction with the baked-in weights? To be taken seriously, that is really necessary to explain in detail, than a passing mention.
reply
Heh, I said this exact thing in another thread the other day. Nice to see I wasn't the only one thinking it.
reply
The middle ground here would be an FPGA, but I belive you would need a very expensive one to implement an LLM on it.
reply
FPGAs would be less efficient than GPUs.

FPGAs don’t scale if they did all GPUs would’ve been replaced by FPGAs for graphics a long time ago.

You use an FPGA when spinning a custom ASIC doesn’t makes financial sense and generic processor such as a CPU or GPU is overkill.

Arguably the middle ground here are TPUs, just taking the most efficient parts of a “GPU” when it comes to these workloads but still relying on memory access in every step of the computation.

reply
I thought it was because the number logic elements in a GPU is orders of magnitude higher than in a FPGA, rather than just processing speed. And GPU processing is inherently parallel so the GPU beats the FPGA just based on transistor count.
reply