Doesn't justify 10x the cost in that case imo
Buying the most expensive circular saw doesn't get you the best woodworking, but it is the most expensive woodworking.
https://medium.com/@adambaitch/the-model-vs-the-harness-whic... | https://aakashgupta.medium.com/2025-was-agents-2026-is-agent... | https://x.com/Hxlfed14/status/2028116431876116660 | https://www.langchain.com/blog/the-anatomy-of-an-agent-harne...
(I don't think anecdotes are useful in these comparisons, but I'll throw mine in anyway: I use GPT-5.4, GPT-5.3-Codex, Gemini-3-Pro, Opus, Sonnet, at work every week. I then switch to GLM-5.1, K2-Thinking. Other than how chatty they get, and how they handle planning, I get the same results. Sometimes they're great, sometimes I spent an hour trying to coax them towards the solution I want. The more time I spend describing the problem and solution and feeding them data, the better the results, regardless of model. The biggest problem I run into lately is every website in the world is blocking WebFetch so I have to manually download docs, which sucks. And for 90% of my coding and system work, I see no difference between M2.5 and SOTA models, because there's only so much better you can get at writing a simple script or function or navigating a shell. This is why Anthropic themselves have always told people to use Sonnet to orchestrate complex work, and Haiku for subagents. But of course they want you to pay for Opus, because they want your money.)
Also not everyone wants to use Claude Code, so if they're paying API pricing it's more likely thousands of dollars a month. If you can get the same results by spending a fraction of that, why wouldn't you?
I have an Anthropic API key for work, and if I use sonnet/opus all day for agent coding, it ends up costing about ~$25.
I am going to need more cpu/ram to run multiple agents in parallel to spend much more than that.
That was the breaking point, I cancelled my subscription.
As it happens I had a low coding workload over the past two weeks so I've been noodling around in PI mostly with Gemini Flash api. I like it - I even agree it's a much better harness than CC. However, the lock in is real. Even without switching models which each have their own quirks, I expect my work speed to drop drastically for at least a week or two even if I was focused on it fully. But after the learning period I think pi will be faster. The danger of course is that CC is fairly on rails while with PI you could end up spending all your time tinkering with the harness.
You can't do any serious work on it without rationing your work and kneecapping your workflows, to the point where you design workflows around anthropic usage limit woodoo rather than what actually works.
Without this, I run into WEEKLY usage limits on $200 plan, working on a single codebase, one feature at a time, on just day 3.
On a related note, I really need to try some local models (probably starting with qwen), since, at least in 2026, the Chinese models are way better at protecting democracy and free speech than the US models.
What if they learned that half of the American small and medium sized companies would have started pouring all their business information into such a service?