upvote
> My dream would be a local model that can do, say, 80% of the day to day tasks I need; "how does X Handler connect to Y storage?", "commit that feature, but leave out the bits that relate to billing" etc.

Qwen 3.6 27B can do that today, but setup properly and in a good quant, I run an autoround [0] with weights in int8 and attention heads in f16 on a single RTX 6000 Pro Blackwell Max-Q via vllm with mtp=2 and full context, --max-num-seqs 3, KV in f16, mamba f32.

>It would have 99% reliable tool calling

I managed to score 93/100 in tool-eval-bench [1]. For me this is very good already, at least in the pi coding harness I've never had an issue that wasn't auto-fixed in the next turn(s).

>the ability to go "this task is beyond my skills" and refer to a Big Boy Online Model in a gigantic datacenter somewhere

This is heavy on the harness engineering side I think, but also quite contrary to the nature of LLMs today. If you figure this out I'd love to know.

[0] https://huggingface.co/Minachist/Qwen3.6-27B-INT8-AutoRound/...

[1] https://github.com/SeraphimSerapis/tool-eval-bench

reply
Claude kind of has this already in their Advisor feature. I don't think I've seen it elsewhere. Open harnesses could add this feature and call out to big boy models when required. It's a really great idea.
reply
It’s a lot harder to get right than it sounds. I’ve been trying to as a Pi extension, but models are biased to think they’re better than they actually are.

So far the best results I’ve got have been using a much smaller local model as a simple classifier, that makes a call based on the system prompt and incoming prompt where to route it. It works okay, still a long way to go though

reply