undefined

points

[-]

Data tagging? 20k tok/s is at the point where I'd consider running an LLM on data from a column of a database, and these <=100 token problems provide the least chance of hallucination or stupidity.

by danpalmer12 hours ago|

prev|

[-]

Alternatively, you could run far more RAG and thinking to integrate recent knowledge, I would imagine models designed for this putting less emphasis on world knowledge and more on agentic search.

by freeone300011 hours ago|

parent|

[-]

Maybe; models with more embedded associations are also better at search. (Intuitively, this tracks; a model with no world knowledge has no awareness of synonyms or relations (a pure markov model), so the more knowledge a model has, the better it can search.) It’s not clear if it’s possible to build such a model, since there doesn’t seem to be a scaling cliff.

by pjc5012 hours ago|

prev|

[-]

Where are those numbers from? It's not immediately clear to me that you can distribute one model across chips with this design.

> Model is etched onto the silicon chip. So can’t change anything about the model after the chip has been designed and manufactured.

Subtle detail here: the fastest turnaround that one could reasonably expect on that process is about six months. This might eventually be useful, but at the moment it seems like the model churn is huge and people insist you use this week's model for best results.

by aurareturn12 hours ago|

parent|

[-]

  > The first generation HC1 chip is implemented in the 6 nanometer N6 process from TSMC. Each HC1 chip has 53 billion transistors on the package, most of it very likely for ROM and SRAM memory. The HC1 card burns about 200 watts, says Bajic, and a two-socket X86 server with ten HC1 cards in it runs 2,500 watts.

https://www.nextplatform.com/2026/02/19/taalas-etches-ai-mod...

by darkwater11 hours ago|

parent|

[-]

And what of that makes you assume that having a server with 10 HC1 cards is needed to run a single model version on that server?

by dakolli12 hours ago|

parent|

prev|

[-]

So it lights money on fire extra fast, AI focused VCs are going to really love it then!!

by mike_hearn5 hours ago|

parent|

prev|

[-]

Well they claim two month turnaround. Big If True. How does the six months break down in your estimation? Maybe they have found a way to reduce the turnaround time.

by adityashankar12 hours ago|

parent|

prev|

[-]

This depends on how much better the models will get from now in, if Claude Opus 4.6 was transformed into one of these chips and ran at a hypothetical 17k tokens/second, I'm sure that would be astounding, this depends on how much better claude Opus 5 would be compared to the current generation

by aurareturn11 hours ago|

parent|

[-]

I’m pretty sure they’d need a small data center to run a model the size of Opus.

by empath759 hours ago|

parent|

prev|

[-]

Even an O3 quality model at that speed would be incredible for a great many tasks. Not everything needs to be claude code. Imagine Apple fine tuning a mid tier reasoning model on personal assistant/MacOs/IOS sorts of tasks and burning a chip onto the mac studio motherboard. Could you run claude code on it? Probably not, would it be 1000x better than Siri? absolutely.

by JKCalhoun1 hours ago|

parent|

[-]

Yeah, waiting for Apple to cut a die that can do excellent local AI.

by empath759 hours ago|

parent|

prev|

[-]

100x of a less good model might be better than 1 of a better model for many many applications.

This isn't ready for phones yet, but think of something like phones where people buy new ones every 3 years and even having a mediocre on-device model at that speed would be incredible for something like siri.

by machiaweliczny10 hours ago|

prev|

[-]

A lot of NLP tasks could benefit from this

by thrance12 hours ago|

prev|

[-]

> What is a task that is extremely high value, only require a small model intelligence, require tremendous speed, is ok to run on a cloud due to power requirements, AND will be used for years without change since the model is etched into silicon?

Video game NPCs?

by aurareturn12 hours ago|

parent|

[-]

Doesn’t pass the high value and require tremendous speed tests.

by mike_hearn5 hours ago|

parent|

[-]

Speed = capacity = cost.

by thrance8 hours ago|

parent|

prev|

[-]

Video games are a huge market, and speed and cost of current models are definitely huge barriers to integrating LLMs in video games.

by Shaanveer12 hours ago|

prev|

[-]

ceo

by charcircuit12 hours ago|

parent|

[-]

No one would never give such a weak model that much power over a company.

by teaearlgraycold12 hours ago|

prev|

[-]

I'm thinking the best end result would come from custom-built models. An 8 billion parameter generalized model will run really quickly while not being particularly good at anything. But the same parameter count dedicated to parsing emails, RAG summarization, or some other specialized task could be more than good enough while also running at crazy speeds.

by 12 hours ago|

prev|

[-]

deleted