The best performance I've gotten is by mixing agents from different companies. Unless there is a "winner take all" agent (I seriously doubt it, based on the dynamics and cost of collecting high quality RL data), I think the best orchestration systems are going to involve mixing agents.
Here, it's not about the planner, it's about the workers. Some agents are just better at certain things than others.
For instance, Opus 4.6 on max does not hold a candle to GPT 5.4 xhigh in terms of bug finding. It's just not even a comparison, iykyk.
Almost analogous to how diversity of thought can improve the robustness of the outcomes in real world teams. The same thing seems to be true in mixture-of-agent-distributions space.
For Anthropic to have the best version of this software, they'd have to simultaneously ... well, have the best version of the software, but also beat every other AI company at all subtasks (like: technical writing, diagramming, bug finding -- they'd need to have the unequivocal "best model" in all categories).
Surely their version is not going to allow you to e.g. invoke Codex or what have you as part of their stack.
Having Opus write a spec, then send to Gemini to revise, back to Opus to fix, then to me to read and approve..
Send to a local model like Qwen3.5 to build, then off to Opus to review ...
This was such an amazing flow, until Anthropic decided to change their minds.
To score a big IPO they need to be a platform, not just a token pipeline. Everything they’re doing signals they’re moving in this direction.
FWIW- IMO, being locked into a single model provider is a deal breaker.
This solution will distract a lot of folks and doom-lock them into Anthropic. That’ll probably be fine for small offices, but it is suicidal to get hooked into Anthropic’s way of doing things for anything complex. IME, you want to be able to compare different models and you end up managing them to your style. It’s a bit like cooking- where you may have greater affinity for certain flavors. You make selection tradeoffs on when to use a frontier model on design & planning vs something self hosted for simpler operations tasks.
Which projects are standing out in this space right now?
It works on top of k8s, so you can deploy and run in your own compute cluster. Right now it's focused only on coding tasks but I'm currently working on abstractions so you can similarly orchestrate large runs of any agentic workflow.
When the models have an off day, the workflows you’ve grown to depend upon fail. When you’re completely dependent on Anthropic for not only execution but troubleshooting- you’re doomed. You lose a whole day troubleshooting model performance variability when you should have just logged off and waited. These are very cognitively disruptive days.
Build in multi-model support- so your agents can modify routing if an observer discovers variability.
Until then, every agent framework is completely reinvented every week due to new patterns and new models. evals, ReACT, DSPy, RLM, memory patterns, claws, dynamic context, sandbox strategies. It seems like locking in to a framework is a losing proposition for anyone trying to stay competitive. See also: LangChain trying to be the Next.js/Vercel of agents but everyone recommending building your own.
That said, Anthropic pulls a lot of weight owning the models themselves and probably an easier-to-use solution will get some adoption from those who are better served by going from nothing to something agentic, despite lock-in and the constant churn of model tech
That plus everyone is using 5 different vector DBs and reranking models from different vendors than the answer models etc.
Originally I thought they would stick towards being a model provider mainly, but with all the recent releases it seems they do want to provide more "services."
Wonder what part of the market 3rd party apps will build a moat around?
There's a lot of money to be made in small business automation right now.
1. We pay for saas, so we don't have to manage it. If you vibe-code or use these AI things, then you are managing it yourself.
2. Most Saas is like $20-$100/month/person for most Saas. For a software engineer, that maybe <1h of pay.
3. Most Saas require some sort of human in the loop to check for quality (at least sampling). No users would want to do that.
Number 2 is the biggest reason. It's $20 a month.... I'm not gonna replace that with anything.
Writing this message already costs more than $20 of my time.
I predict that the market will get bigger because people are more prone to automate the long-tail/last-mile stuff since they are able to
I can see that, assuming models don't make some giant leap forward.
Call me stupid, but this sounds not like they want software developers to be around in a year or two.
We've got Claude Managed Agents, Claude Agent SDK, Claude API, Claude Code, Claude Platform, Claude Cowork, Claude Enterprise, and plain old 'Claude'. And honourable mention to Claude Haiku/Sonnet/Opus 4.{whatever} as yet another thing with the same prefix. I feel like it's about once a week I see a new announcement here on HN about some new agentic Claude whatever-it-is.
I have pretty much retreated in the face of this to 'just the API + `pi` + Claude Opus 4.{most recent minor release}', as a surface area I can understand.
I own a stake in a small brewery in Canada, and this feature just saved me setting up some infrastructure to "productionize" an agent we created to assist with ordering, invoicing, and government document creation.
I get paid in beer and vibes for projects like these, so the more I can ship these projects in the same place I prototype them the better.
(Also don't worry all, still have SF income to buy food for my family with)
quick question, how do you manage these side projects that kinda need to be production ready but aren't you are actual SF job lol?
some of these people think they are my actual customer/client but like i do it for fun and to help them out.
But beyond that, AWS is a very complex platform. Agents simplify saas, the agent itself manages the api calls, maybe the database queries, more of the logic. As software moves into the agent, you need less cloud capability, and a better agent harness/hosting. Essentially, this makes the AWS platform obsolete, most services make much less sense.