upvote
Taalas. A sibling comment of yours posted the chat demo URL -

https://chatjimmy.ai/

reply
Woah. How is this working? It's stupid fast.
reply
The weights are mapped directly to transistors. It's not a generic processor, it's literally a dedicated Llama 8B chip that can't be used for anything else. When you specialize in hardware you get faster - Taalas is pushing that to the limit.

They seem to be doing well. I checked recently and their API is closed to signups due to overwhelming demand.

reply
cerebras

They built an entire wafer ASIC. The entire thing is one huge active ASIC. it takes a lot of cool engineering and cooling to make it work, and is very cool.

reply
Groq.
reply
No, it was a custom ASIC chip with weights baked in for a singular model. I do envision a future where we return to cartridges. Local AI is de facto and massively optimised chips are built to be plug and play running a single SoTA model.
reply