undefined

points

[-]

Their 2.4 kW is for 10 chips it seems based on the next platform article.

I assume they need all 10 chips for their 8B q3 model. Otherwise, they would have said so or they would have put a more impressive model as the demo.

https://www.nextplatform.com/2026/02/19/taalas-etches-ai-mod...

by audunw11 hours ago|

parent|

[-]

It doesn’t make any sense to think you need the whole server to run one model. It’s much more likely that each server runs 10 instances of the model

1. It doesn’t make sense in terms of architecture. It’s one chip. You can’t split one model over 10 identical hardwire chips

2. It doesn’t add up with their claims of better power efficiency. 2.4kW for one model would be really bad.

by aurareturn11 hours ago|

parent|

[-]

We are both wrong.

First, it is likely one chip for llama 8B q3 with 1k context size. This could fit into around 3GB of SRAM which is about the theoretical maximum for TSMC N6 reticle limit.

Second, their plan is to etch larger models across multiple connected chips. It’s physically impossible to run bigger models otherwise since 3GB SRAM is about the max you can have on an 850mm2 chip.

  followed by a frontier-class large language model running inference across a collection of HC cards by year-end under its HC2 architecture

https://mlq.ai/news/taalas-secures-169m-funding-to-develop-a...

by pigpop1 hours ago|

parent|

[-]

Aren't they only using the SRAM for the KV cache? They mention that the hardwired weights have a very high density. They say about the ROM part:

> We have got this scheme for the mask ROM recall fabric – the hard-wired part – where we can store four bits away and do the multiply related to it – everything – with a single transistor. So the density is basically insane.

I'm not a hardware guy but they seem to be making a strong distinction between the techniques they're using for the weights vs KV cache

> In the current generation, our density is 8 billion parameters on the hard wired part of the chip., plus the SRAM to allow us to do KV caches, adaptations like fine tuning, and etc.

by moralestapia11 hours ago|

parent|

prev|

[-]

Thanks for having a brain.

Not sure who started that "split into 10 chips" claim, it's just dumb.

This is Llama 3B hardcoded (literally) on one chip. That's what the startup is about, they emphasize this multiple times.

by aurareturn11 hours ago|

parent|

[-]

It’s just dumb to think that one chip per model is their plan. They stated that their plan is to chain multiple chips together.

I was indeed wrong about 10 chips. I thought they would use llama 8B 16bit and a few thousand context size. It turns out, they used llama 8B 3bit with around 1k context size. That made me assume they must have chained multiple chips together since the max SRAM on TSMC n6 for reticle sized chip is only around 3GB.