I run 32b agents locally on a big video card, and smaller ones in CPU, but the lack there isn't the logic or reasoning, it is the chain of tooling that Claude Code and other stacks have built in.
Doing a lot of testing recently with my own harness, you would not believe the quality improvement you can get from a smaller LLM with really good opening context.
Even Microsoft is working on 1-bit LLMs...it sucks right now, but what about in 5 years?
But the OP is correct -- everything will have an LLM on it eventually, much sooner than people who do not understand what is going on right now would ever believe is possible.
Your idea of what people need from Local LLMs and others are different. Not everybody needs a /r/myboyfriendisai level performance.