undefined

points

by the__alchemist11 hours ago |

comments

by RobotToaster10 hours ago|

[-]

It reminds me of the switch from GPUs to ASICs in bitcoin mining. I've been expecting this to happen.

by yunohn6 hours ago|

parent|

[-]

But the BTC mining algorithm has not and will not change. That’s the only reason ASICs atleast make a bit of sense for crypto.

AI being static weights is already challenged with the frequent model updates we already see - but may even be a relic once we find a new architecture.

by fxnn4 hours ago|

parent|

[-]

We can expect the model landscape to consolidate some day. Progress will become slower, innovations will become smaller. Not tomorrow, not next year, but the time will come.

And then it'll increasingly make sense to build such a chip into laptops, smartphones, wearables. Not for high-end tasks, but to drive the everyday bread-and-butter tasks.

by yunohn3 hours ago|

parent|

[-]

The world continues to evolve, in a way that requires flexibility - not more constraints. I just fail to see a future where we want less general purpose computers, and more hard-wired ones? Would be interesting to be proven wrong though!

by dangus5 hours ago|

parent|

prev|

[-]

Sounds to me like there’s potential to use these for established models to provide cost/scale advantage while frontier models will run in the existing setup.

by yunohn5 hours ago|

parent|

[-]

IME llama et all require LoRA or fine-tuning to be usable. That's their real value vs closed source massive models, and their small size makes this possible, appealing, and doable on a recurring basis as things evolve. Again, rendering ASICs useless.

by fxnn4 hours ago|

parent|

[-]

Read the blog post. It mentions that their chip has a small SRAM which can store LoRA.

by yunohn3 hours ago|

parent|

[-]

Neither the blog nor Taalas' original post specify what speed to expect when using the SRAM in conjunction with the baked-in weights? To be taken seriously, that is really necessary to explain in detail, than a passing mention.

by hkt7 hours ago|

parent|

prev|

[-]

Heh, I said this exact thing in another thread the other day. Nice to see I wasn't the only one thinking it.

by GTP11 hours ago|

prev|

[-]

The middle ground here would be an FPGA, but I belive you would need a very expensive one to implement an LLM on it.

by dogma113810 hours ago|

parent|

[-]

FPGAs would be less efficient than GPUs.

FPGAs don’t scale if they did all GPUs would’ve been replaced by FPGAs for graphics a long time ago.

You use an FPGA when spinning a custom ASIC doesn’t makes financial sense and generic processor such as a CPU or GPU is overkill.

Arguably the middle ground here are TPUs, just taking the most efficient parts of a “GPU” when it comes to these workloads but still relying on memory access in every step of the computation.

by jgalt2129 hours ago|

parent|

[-]

I thought it was because the number logic elements in a GPU is orders of magnitude higher than in a FPGA, rather than just processing speed. And GPU processing is inherently parallel so the GPU beats the FPGA just based on transistor count.