upvote
> but with a good harness they are able to achieve things with SotA that I couldn't last year.

What happens if you run last years model in a SOTA harness? IME, the quality of the harness has a much more significant impact on the quality of the result, once you get past the initial hump of “can it do anything at all”

reply
I think this is a big component, but also context. A large factor in any model being able to handle complexity comes down to context length.

I think multiple SLMs driven by an orchestration frameworks (harness or otherwise) will ultimately displace LLMs. Right now we're in the era of diminishing returns with respect to LLM gains. Moving the needle percentages doesn't excite as many people anymore and with "reasoning" capabilities there's no reason why small distributed models can't be run more efficiently, especially if/when we start to see gains in modularized context management solutions.

reply
sure, but high-quality harnesses require less gpu compute/VRAM, and plausibly can be used locally by most users.
reply
"Have you personally used any of the latest batch of even smaller local models?"

No I have not, which is why I asked (it wasn't a rhetorical question). Do you have pointers on what the recent improvements are?

reply
Try qwen 3.6 models with hermes and see for yourself. 27b is excellent and 35b is very good for basic agentic tasks.
reply
Can you spare a sentence or two describing your local setup?
reply
biggest thing i wish was present in more discussions about models is people providing more specifics on their setups vs. vague descriptions of harnesses
reply
can you please share details about your harness
reply