upvote
This can backfire a bit on token usage where it gets a bit to trigger happy running expensive things for trivial changes. I tend to not use sub agents for this reason. I actually manage to cover most my needs on the 20$/month codex subscription. I might switch to the 200$ plan at some point. But right now I just need to be economical as our company is fairy resource constrained. That's also why I prefer Codex over Claude Code. It seems it gets the job done for less $. Another advantage is that it seems to have less need to have things like this spelled out in this level of detail.

Another thing is that unless you are doing really complicated stuff, you probably don't need the latest models running on high. I'm still on 5.4 medium with codex. I see very little reason to change that.

Part of agentic engineering is figuring out how to be economical with tokens and time. You can sacrifice one for the other of course. But there are diminishing returns as well where spending 10x more doesn't actually get you 10x more quality/results.

reply
I just have the Anthropic 100 USD Max plan and it's enough for daily work - I sometimes do hit the 5 hour limits by mid day, but weekly ones usually cap out at around 80% or thereabout, even with this approach. I usually use xhigh, sometimes max, both still result in situations where I need to intervene plenty, not even on that complex use cases (some LLM stuff, mostly web based CRUD, some light data processing, integrations with Jira and GitLab, processing PDFs and so on, sometimes ML training and geospatial work, like the Sentinel-2 satellite data, nothing crazy).

If I had to pay per token, I'd probably look at DeepSeek. In general it feels like it's a bit early for the technology - either our software methods are wasteful, or the hardware hasn't caught up. To me, it appears that we often need to throw more tokens at these problems, not less, since otherwise it's just one-shot slop.

reply
> once all the code seems okay, you will run THREE parallel sub-agents for code review: each looking at ALL changed code

I did some evals with a prompt like this when I had some subscription tokens to burn, a few months ago. I think using Opus 4.5. What I found was:

1. Running two subagents was somewhat useful

2. Running three started to get redundant

3. Any more than three was pointless (at least when using the same model)

However, even two were getting like 60% the same results.

Much, much more effective was splitting out into audits through different lenses:

* One looking for security issues

* One looking for whether the task was completed successfully

* One looking for performance issues

* One looking for contract/maintainability issues

* One looking at test coverage

Etc.

reply
You can get reasonably close with fewer, however more agents give better signal: e.g. if 3/3 flag something as an issue, the outer one that orchestrates them can view it as something to give more attention to, whereas if it's just 1/3, then it probably begs more consideration. Ofc more doesn't always imply right.
reply