undefined

When output is good enough, other considerations become more important. Most people on this planet cannot afford even an AI subscription, and cost of tokens is prohibitive to many low margin businesses. Privacy and personalization matter too, data sovereignty is a hot topic. Besides, we already see how focus has shifted to orchestration, which can be done on CPU and is cheap - software optimizations may compensate hardware deficiencies, so it’s not going to be frozen. I think the market for local hardware inference is bigger than for clouds, and it’s going to repeat Android vs iOS story.

by wmf2 hours ago|

parent|

[-]

Taalas is more expensive than NPUs not less. You have GPU/NPU at home; just use it.

by ivan_gammel1 hours ago|

parent|

[-]

I feel weird defending Taalas here, but this argument is quite strange: of course it is more expensive now. It is irrelevant - all innovations are expensive at early stage. The question is, what this technology will cost tomorrow? Can it do for consumers what NPUs could not, offering good UX and quality of inference for reasonable price?

by wmf1 hours ago|

parent|

[-]

It will always be more expensive.

by bigyabai5 hours ago|

parent|

prev|

[-]

This is the same justification that was used to ship the (now almost entirely defunct) NPUs on Apple and Android devices alike.

The A18 iPhone chip has 15b transistors for the GPU and CPU; the Taalas ASIC has 53b transistors dedicated to inference alone. If it's anything like NPUs, almost all vendors will bypass the baked-in silicon to use GPU acceleration past a certain point. It makes much more sense to ship a CUDA-style flexible GPGPU architecture.

by ivan_gammel4 hours ago|

parent|

[-]

Why are you thinking about phones specifically? Most heavy users are on laptops and workstations. On smartphones there might be a few more innovations necessary (low latency AI computing on the edge?)

by bigyabai2 hours ago|

parent|

[-]

Many laptops and workstations also fell for the NPU meme, which in retrospect was a mistake compared to reworking your GPU architecture. Those NPUs are all dark silicon now, just like these Taalas chips will be in 12-24 months.

Dedicated inference ASICs are a dead end. You can't reprogram them, you can't finetune them, and they won't keep any of their resale value. Outside cruise missiles it's hard to imagine where such a disposable technology would be desirable.

by ivan_gammel1 hours ago|

parent|

[-]

Most consumers do not care about reprogramming or fine-tuning and have no idea what NPU is. For many (including specifically those who still mourn dead AI companions, killed by 4o switch) the long term stability is much more important than benchmark performance of evergreen frontier model. If Taalas can produce a good hardwired model at scale at consumer market price point, a lot of people will just drop their AI subscriptions.

by bigyabai28 minutes ago|

parent|

[-]

> a lot of people will just drop their AI subscriptions.

For a 2.5 kW Server? I don't see it happening, your money and electricity is better spent on CUDA compute.

by sowbug6 hours ago|

parent|

prev|

[-]

Bake in a Genius Bar employee, trained on your model's hardware, whose entire reason for existence is to fix your computer when it breaks. If it takes an extra 50 cents of die space but saves Apple a dollar of support costs over the lifetime of the device, it's worth it.

by padjo11 hours ago|

parent|

prev|

[-]

Is progress still exponential? Feels like its flattening to me, it is hard to quantify but if you could get Opus 4.2 to work at the speed of the Taalas demo and run locally I feel like I'd get an awful lot done.

by r0b0511 hours ago|

parent|

prev|

[-]

Yeah, the space moves so quickly that I would not want to couple the hardware with a model that might be outdated in a month. There are some interesting talking points but a general purpose programmable asic makes more sense to me.

by RobertDeNiro11 hours ago|

parent|

prev|

[-]

It won’t stay exponential forever.

by selcuka10 hours ago|

parent|

prev|

[-]

> what is the point of that

Planned obsolescence? /s

Jokes aside, they can make the "LLM chip" removable. I know almost nothing is replaceable in MacBooks, but this could be an exception.