undefined

points

[-]

I agree with you in the sense that if you tried to take any model right now and cram it into an iphone, it wouldnt be a claude-level agent.

I run 32b agents locally on a big video card, and smaller ones in CPU, but the lack there isn't the logic or reasoning, it is the chain of tooling that Claude Code and other stacks have built in.

Doing a lot of testing recently with my own harness, you would not believe the quality improvement you can get from a smaller LLM with really good opening context.

Even Microsoft is working on 1-bit LLMs...it sucks right now, but what about in 5 years?

But the OP is correct -- everything will have an LLM on it eventually, much sooner than people who do not understand what is going on right now would ever believe is possible.

by kylehotchkiss6 hours ago|

prev|

[-]

Yes. I've spent months running Qwen2.5-8B on my barebones 16gb ram M4 Mac mini to handle identifying sites from google search results. It has been rock solid. I'm not even running this MLX-powered improvement on it yet.

Your idea of what people need from Local LLMs and others are different. Not everybody needs a /r/myboyfriendisai level performance.