undefined

points

by darkwater7 hours ago |

comments

by wmf5 hours ago|

[-]

The Taalas approach is much more expensive than the NPU that phones already have.

by slow_typist4 hours ago|

parent|

[-]

Yes but not in five years. The chips will be dirt cheap by then. We‘ll get “intelligent” washing machines that will discuss the amount of detergent and eventually berate us. Toasters with voice input. And really annoying elevators. Also bugs that keep an extremely low RF profile (only phoning home when the target is talking business).

by wmf3 hours ago|

parent|

[-]

No, Taalas requires more silicon which will always cost more than storing weights in DRAM.

by throwthrowuknow7 hours ago|

prev|

[-]

it doesn’t need to go in the phone if it only takes a few milliseconds to respond and is cheap

by yunwal5 hours ago|

parent|

[-]

Perceptible latency is somewhere between 10 and 100ms. Even if an LLM was hosted in every aws region in the world, latency would likely be annoying if you were expecting near-realtime responses (for example, if you were using an llm as autocomplete while typing). If, say, apple had an LLM on a chip any app could use some SDK to access, it could feasibly unlock a whole bunch of usecases that would be impractical with a network call.

Also, offline access is still a necessity for many usecases. If you have something like an autocomplete feature that stops working when you're on the subway, the change in UX between offline and online makes the feature more disruptive than helpful.

https://www.cloudping.co/

by hamdingers7 hours ago|

parent|

prev|

[-]

It does if you care about who can access to your tokens