undefined

points

by Alifatisk6 hours ago |

comments

by mike_hearn5 hours ago|

[-]

They don't give cost figures in their blog post but they do here:

https://www.nextplatform.com/wp-content/uploads/2026/02/taal...

Probably they don't know what the market will bear and want to do some exploratory pricing, hence the "contact us" API access form. That's fair enough. But they're claiming orders of magnitude cost reduction.

> Is there really a potential market for hardware designed for one model only?

I'm sure there is. Models are largely interchangeable especially as the low end. There are lots of use cases where you don't need super smart models but cheapness and fastness can matter a lot.

Think about a simple use case: a company has a list of one million customer names but no information about gender or age. They'd like to get a rough understanding of this. Mapping name -> guessed gender, rough guess of age is a simple problem for even dumb LLMs. I just tried it on ChatJimmy and it worked fine. For this kind of exploratory data problem you really benefit from mass parallelism, low cost and low latency.

> Shouldn't there be a more flexible way?

The whole point of their design is to sacrifice flexibility for speed, although they claim they support fine tunes via LoRAs. LLMs are already supremely flexible so it probably doesn't matter.

by pigpop1 hours ago|

parent|

[-]

Yes, there are all kinds of fuzzy NLP tasks that this would be great for. Jobs where you can chunk the text into small units and add instructions and only need a short response. You could burn through huge data sets very quickly using these chips.

by himata41135 hours ago|

prev|

[-]

I personally don't buy it, cerebras is way more advanced than this, comparing this tok/s to cerebras is disingenious.