The most powerful AI interactions I've had involved giving a model a task and then fucking off. At that point, I don't actually care if it takes 5 minutes or an hour. I've cued up a list of background tasks it can work on, and that I can circle back to when I have time. In that context, smaller isn't even the virtue at hand–user patience is. Having a machine that works on my bullshit questions and modelling projects at one tenth the speed of a datacentre could still work out to being a good deal even before considering the privacy and lock-in problems.
Claude and Kagi Assistant. I tried tooling up a multi-model environment in Ollama and it was annoying. It's just searching the web, building models and then running a test suite against the model to refine it.
It's pretty clear that this isn't going to happen any time soon, if ever. You can't shrink the models without destroying their coherence, and this is a consistently robust observation across the board.
Smaller models have gotten much more powerful the last 2 years. Qwen 3.5 is one example of this. The cost/compute requirements of running the same level intelligence is going down
The inputs are parsed with a large LLM. This gets passed on to a smaller hyper specific model. That outputs to a large LLM to make it readable.
Essentially you can blend two model type. Probabilistic Input > Deterministic function > Probabilistic Output. Have multiple little determainistic models that are choose for specific tasks. Now all of this is VERY easy to say, and VERY difficult to do.
But if it could be done, it would basically shrink all the models needed. Don't need a huge input/output model if it is more of an interpreter.
Give every iPhone family a in house Siri that will deal with canceling services and pursuing refunds.
Your customer screw up results in your site getting an agent drive DDOS on its CS department till you give in.
Siri: "Hey User, here's your daily update, I see you haven't been to the gym, would you like me to harass their customer service department till they let you out of their onerous contract?"
I don’t need the latest and greatest and I fine tuned LM studio enough that I get acceptable results in 30 to 90 seconds that help me keep moving ahead. I am not a software engineer, I am definitely not as much of a “coder” as the average person on HN. So if I can do it for less than $2000, I bet a lot of (smarter/experience coding) people could see great results for $5000.
You can get an M3 ultra Mac studio with 96gb ram for $4000. If you’re willing to go up to $6k it’s 256gb. Wayyyyy more firepower than my setup. I imagine plenty powerful for a lot of people.