upvote
Models on the phone is never going to make sense.

If you're loading gigabytes of model weights into memory, you're also pushing gigabytes through the compute for inference. No matter how you slice it, no matter how dense you make the chips, that's going to cost a lot of energy. It's too energy intensive, simple as.

"On device" inference (for large LLM I mean) is a total red herring. You basically never want to do it unless you have unique privacy considerations and you've got a power cable attached to the wall. For a phone maybe you would want a very small model (like 3B something in that size) for Siri-like capabilities.

On a phone, each query/response is going to cost you 0.5% of your battery. That just isn't tenable for the way these models are being used.

Try this for yourself. Load a 7B model on your laptop and talk to it for 30 minutes. These things suck energy like a vacuum, even the shitty models. A network round trip costs gets you hundreds of tokens from a SOTA model and costs 1 joule. By contrast, a single forward pass (one token) of a shitty 7b model costs 1 joule. It's just not tenable.

reply
Huh, I hadn't thought of battery limitations. Good call. My initial reaction is that bigger/better batteries, hyper fast recharge times and more efficient processors might address this issue, but I need to learn more about it.

That said, power consumption is one of the reasons I think pushing this stuff to the edge is the only real path for AI in terms of a business model. It basically spreads the load and passes the cost of power to the end user, rather than trying to figure out how to pay for it at the data center level.

reply
In a recent episode of Dwarkesh the guest who is a semiconductor industry analyst predicted that an iPhone will increase in price by about $250 for the same stuff due to increased ram/chip costs from AI. Apple will not be able to afford to put a bunch more RAM into the phones and still sell them.
reply
> In a recent episode of Dwarkesh the guest who is a semiconductor industry analyst predicted that an iPhone will increase in price by about $250 for the same stuff due to increased ram/chip costs from AI. Apple will not be able to afford to put a bunch more RAM into the phones and still sell them.

Apple recently stated on an earnings call they signed contracts with RAM vendors before prices got out of control, so they should be good for a while. Nvidia also uses TSMC for their chips, which may affect A series and M series chip production.

Yes, TSMC has a plant in Arizona but my understanding is they can't make the cutting edge chips there; at least not yet.

reply
RAM is just too expensive. We need to bring back non-DRAM persistent memory that doesn't have the wearout issues of NAND.
reply
multiple NAND, and apple already used it in Mac Studio. Plus better cooling
reply
I think this is roughly true, but instead RAM will remain a discriminator even moreso. If the scaling laws apple has domain over are compute and model size, then they'll pretty easily be able to map that into their existing price tiers.

Pros will want higher intelligence or throughput. Less demanding or knowledgeable customers will get price-funneled to what Apple thinks is the market premium for their use case.

It'll probably be a little harder to keep their developers RAM disciplined (if that's even still true) for typical concerns. But model swap will be a big deal. The same exit vs voice issues will exist for apple customers but the margin logic seems to remain.

reply
> nor create specialized SoCs with ML cores that obviate the need for lots and lots of RAM

Why do you say they can't do this?

reply
Possibly this just isn't the generation of hardware to solve this problem in? We're like, what three or four years in at most, and only barely two in towards AI assisted development being practical. I wouldn't want to be the first mover here, and I don't know if it's a good point in history to try and solve the problem. Everything we're doing right now with AI, we will likely not be doing in five years. If I were running a company like Apple, I'd just sit on the problem until the technology stabilizes and matures.
reply
If I was running a company like Apple, I'd be working with Khronos to kill CUDA since yesterday. There are multiple trillions of dollars that could be Apple's if they sign CUDA drivers on macOS, or create a CUDA-compatible layer. Instead, Apple is spinning their wheels and promoting nothingburger technology like the NPU and MPS.

It's not like Apple's GPU designs are world-class anyways, they're basically neck-and-neck with AMD for raster efficiency. Except unlike AMD, Apple has all the resources in the world to compete with Nvidia and simply chooses to sit on their ass.

reply
CUDA is not the real issue, AMD's HIP offers source-level compatibility with CUDA code, and ZLUDA even provides raw binary compatibility. nVidia GPUs really are quite good, and the projected advantages of going multi-vendor just aren't worth the hassle given the amount of architecture-specificity GPUs are going to have.
reply
Okay, then don't kill CUDA, just sign CUDA drivers on macOS instead and quit pretending like MPS is a world-class solution. There are trillions on the table, this is not an unsolvable issue.
reply
Admittedly, my use of CUDA and Metal is fairly surface-level. But I have had great success using LLMs to convert whole gaussian splatting CUDA codebases to Metal. It's not ideal for maintainability and not 1:1, but if CUDA was a moat for NVIDIA, I believe LLMs have dealt a blow to it.
reply