upvote
LLM usage is not very likely to "dry up".

What is more likely to happen though is that it doesn't take multiple $10B of datacenter and capital to build out models--and the performance against LLM benchmarks starts to max out to the point where throwing more capital at it doesn't make enough of a difference to matter.

Once the costs shrink below $1B then Apple could start building their own models with the $139B in cash and marketable securities that they have--while everyone else has burned through $100B trying to be first.

Of course the problem with this strategy right now is that Siri really, really sucks. They do need to come up with some product improvements now so that they don't get completely lapped.

reply
those things could likely just run fine on the gpu though
reply
They could run fine on the CPU too. But these are mobile devices, therefore battery usage is another significant metric. Dedicated hardware is more energy efficient than general hardware, and GPU in particular is a power-hog.
reply
Exactly. It's the same thing as video or audio encoding and decoding. Sure the CPU could do it, potentially use the GPU, but having actual hardware encoders and decoders for the most common codecs will save a lot of energy.
reply
Not if GPU RAM is a limiter. Which it is for most models.

Unified memory is a serious architectural improvement.

How many GPUs does it take to match the RAM, and make up for the additional communication overhead, of a RAM-maxed Mac? Whatever the answer, it won’t fit in a MacBook Pro’s physical and energy envelopes. Or that of an all-in-one like the Studio.

reply