undefined

points

[-]

> Or, rather, MiniMax is! The good thing about offloading your work to an LLM is that you can blame it for your shortcomings. Time to get my hands dirty and do it myself, typing code on my keyboard, like the ancient Mayan and Aztec programmers probably did.

They noticed a discrepancy, then went back and wrote code to perform the same operations by hand, without the use of an LLM at all in the code production step. The results still diverged unpredictably from the baseline.

Normally, expecting floating-point MAC operations to produce deterministic results on modern hardware is a fool's errand; they usually operate asynchronously and so the non-commutative properties of floating-point addition rear their head and you get some divergence.

But an order of magnitude difference plus Apple's own LLM not working on this device suggests strongly to me that there is something wrong. Whether it's the silicon or the software would demand more investigation, but this is a well reasoned bug in my book.

by ErroneousBosh21 hours ago|

parent|

[-]

> Time to get my hands dirty and do it myself, typing code on my keyboard, like the ancient Mayan and Aztec programmers probably did.

https://ia800806.us.archive.org/20/items/TheFeelingOfPower/T...

I should think I'll probably see someone posting this on the front page of HN tomorrow, no doubt. I first read it when it was already enormously old, possibly nearly 30 years old, in the mid 1980s when I was about 11 or 12 and starting high school, and voraciously reading all the Golden Age Sci-Fi I could lay my grubby wee hands on. I still think about it, often.

by netsharc20 hours ago|

prev|

[-]

I found the article hard to read. I turned on reader mode. I still found it hard to read. Each sentence is very short. My organic CPU spins trying to figure out how each sentence connects to the next. Each sentence feels more like a paragraph, or a tweet, instead of having a flow. I think that's my issue with it.

by mr_toad19 hours ago|

parent|

[-]

If it was written in turgid prose people would be frantically waggling their AI accusatory fingers.

by netsharc19 hours ago|

parent|

[-]

Instead he writes Buzzfeed style: a sentence per paragraph, and then smushes several paragraphs into one.

(The idea being, a paragraph usually introduces a new thought.)

by decimalenough21 hours ago|

prev|

[-]

My TL;DR is that they tried to run an on-device model to classify expenses, it didn't work even for simple cases ("Kasai Kitchin" -> "unknown"), they went deeeeeep down the rabbit hole to figure out why and concluded that inference on their particular model/phone is borked at the hardware level.

Whether you should do this on device is another story entirely.

by jojobas21 hours ago|

parent|

[-]

Why shouldn't you? It's your device, it has hardware made specifically for inference.

What's to be gained, other than battery life, by offloading inference to someone else? To be lost, at least, is your data ownership and perhaps money.

by dghlsakjg21 hours ago|

parent|

[-]

> What's to be gained... by offloading inference to someone else?

Access to models that local hardware can't run. The kind of model that an iphone struggles to run is blown out of the water by most low end hosted models. Its the same reason that most devs opt for claude code, cursor, copilot, etc. instead of using hosted models for coding assistance.

by selcuka20 hours ago|

parent|

[-]

But apparently this model is sufficient for what the OP wants to do. Also apparently it works on iPhone 15 and 17, but not on 16.

by jojobas20 hours ago|

parent|

prev|

[-]

Claude code produces stuff orders of magnitude more complicated than classifying expenses. If the task can be run locally on hardware you own anyway, it should.

by wolvoleo18 hours ago|

parent|

prev|

[-]

I would really not want to upload my expense data to some random cloud server, nope. On device is really a benefit even if it's not quite as comprehensive. And really in line with apple's privacy focus so it's very imaginable that many of their customers agree.

by 21 hours ago|

prev|

[-]

deleted

by 20 hours ago|

prev|

[-]

deleted