undefined

points

[-]

In a not-too-distant future (5 years?) small LLMs will be good enough to be used as generic models for most tasks. And if you have a dedicated ASIC small enough to fit in an iPhone, you have a truly local AI device with the bonus point that you get something really new to sell in every new generation (i.e. acces to an even more powerful model)

by wmf5 hours ago|

parent|

[-]

The Taalas approach is much more expensive than the NPU that phones already have.

by slow_typist4 hours ago|

parent|

[-]

Yes but not in five years. The chips will be dirt cheap by then. We‘ll get “intelligent” washing machines that will discuss the amount of detergent and eventually berate us. Toasters with voice input. And really annoying elevators. Also bugs that keep an extremely low RF profile (only phoning home when the target is talking business).

by wmf3 hours ago|

parent|

[-]

No, Taalas requires more silicon which will always cost more than storing weights in DRAM.

by throwthrowuknow7 hours ago|

parent|

prev|

[-]

it doesn’t need to go in the phone if it only takes a few milliseconds to respond and is cheap

by yunwal5 hours ago|

parent|

[-]

Perceptible latency is somewhere between 10 and 100ms. Even if an LLM was hosted in every aws region in the world, latency would likely be annoying if you were expecting near-realtime responses (for example, if you were using an llm as autocomplete while typing). If, say, apple had an LLM on a chip any app could use some SDK to access, it could feasibly unlock a whole bunch of usecases that would be impractical with a network call.

Also, offline access is still a necessity for many usecases. If you have something like an autocomplete feature that stops working when you're on the subway, the change in UX between offline and online makes the feature more disruptive than helpful.

https://www.cloudping.co/

by hamdingers7 hours ago|

parent|

prev|

[-]

It does if you care about who can access to your tokens

by iugtmkbdfil8346 hours ago|

prev|

[-]

The real benefit, to a very particular type of mind, is that the alignment will be baked in ( presumably a lot robust than today ) and wrongthink will be eliminated once and for all. It will also help flagging anyone, who would need anything as dangerous as custom, uncensored models. Win/win.

To your point, its neat tech, but the limitations are obvious since 'printing' only one LLM ensures further concentration of power. In other words, history repeats itself.

by luckydata8 hours ago|

prev|

[-]

It doesn't have be to true for all models to be useful. Thinking about small models running on phones or edge devices deployed in the field that would be a perfect use case for a "printed model".