upvote
Maybe this is a naive question, but why wouldn't there be market for this even for frontier models? If Anthropic wanted to burn Opus 4.6 into a chip, wouldn't there theoretically be a price point where this would lower inference costs for them?
reply
Because we don't know if this would scale well to high-quality frontier models. If you need to manufacture dedicated hardware for each new model, that adds a lot of expense and causes a lot of e-waste once the next model releases. In contrast, even this current iteration seems like it would be fantastic for low-grade LLM work.

For example, searching a database of tens of millions of text files. Very little "intelligence" is required, but cost and speed are very important. If you want to know something specific on Wikipedia but don't want to figure out which article to search for, you can just have an LLM read the entire English Wikipedia (7,140,211 articles) and compile a report. Doing that would be prohibitively expensive and glacially slow with standard LLM providers, but Taalas could probably do it in a few minutes or even seconds, and it would probably be pretty cheap.

reply
Many older models are still better at "creative" tasks because new models have been benchmarking for code and reasoning. Pre-training is what gives a model its creativity and layering SFT and RL on top tends to remove some of it in order to have instruction following.
reply
Exactly. One easily relatable use-case is structured content extraction or/and conversion to markdown for web page data. I used to use groq for same (gpt-oss20b model), but even that used to feel slow when doing theis task at scale.

LLM's have opened-up natural language interface to machines. This chip makes it realtime. And that opens a lot of use-cases.

reply
These seem ideal for robotics applications, where there is a low-latency narrow use case path that these chips can serve, maybe locally.
reply