upvote
The author is debugging the tensor operations of the on-device model with a simple prompt. They confirmed the discrepancy with other iPhone models.

It’s no different than someone testing a calculator with 2+2. If it gets that wrong, there’s a hardware issue. That doesn’t mean the only purpose of the calculator is to calculate 2+2. It is for debugging.

You could just as uncharitably complain that “these days no one does arithmetic anymore, they use a calculator for 2+2”.

reply
The app-making wasn't being done on the phone.

The LLM that malfunctioned was there to slap categories on things. And something was going wrong in either the hardware or the compiler.

reply
I mean, Apple's LLM also doesn't work on this device, plus the author compared the outputs from each iterative calculation on this device vs. others and they diverge from every other Apple device. That's a pretty big sign that both, something is different about that device, and this same broken behavior carried across multiple OS versions. Is the hardware or the software "responsible" - who knows, there's no smoking gun there, but it does seem like something is genuinely wrong.

I don't get the snark about LLMs overall in this context; this author uses LLM to help write their code, but is also clearly competent enough to dig in and determine why things don't work when the LLM fails, and performed an LLM-out-of-the-loop debugging session once they decided it wasn't trustworthy. What else could you do in this situation?

reply