I am running opus to make changes to my code then running the code. I am genuinely curious how we are having such disparate experiences here. And at this point, IMO you're in too deep not to share...
Genuinely wondering if you're running gastown or some other crazy mixture of agents pretending they're an AI startup. I get by with a developer agent and a reviewer agent ping ponging off each other encouraged to be rude, crude, and socially unacceptable about it.
One loop of this can take 20-60 min, and eat 2-5% of my week limit. I have to actively slow myself down to not burn trough more than 15-20% of my weekly limit in a day (as I also like to work on it on weekends)
Sadly I cant share the actual problem I am working on as its not my secret to disclose, but its nothing "crazy", and I am so surprised others dont have similar experience.
My observations are these things impact both quality and token consumption a lot.
- Architecture matters really- how messy code is and how poorly things are organized makes a big difference
- how context window is managed especially now with default 1M window.
- How many MCP servers are used. MCP burn a lot, CLI tools are easy , quicker and good ones don't even need any additional harness like skills etc, just prompt to suggest using them.
- Using the right tool matters a lot
- What can be done with traditional deterministic tools have to be done that way with agent controlling (or even building) the tool not doing the tool's work with tokens.
- for large refactors codemod tools AST parsers etc are better than having the agent parse and modify every module/file or inefficiently navigate the codebase with grep and sed.
- How much prep work/planning is put in before the agents starts writing code. Earlier corrections are cheaper than later after code is generated
Typically my starting prompts for the plan phase are 1-2 pages worth and take 30-60m to even put in the cli text box. With that, first I would generate detailed ADRs, documentation and breakdown issues in the ticketing system using the agent, review and correct the plan several times before attempting to have a single line written.---
It is no different as we would do with a human, typing the lines of code was always easy part once you know what you want exactly.
It feels faster to bypass the thinking phase and offload to agent entirely to either stumble around and getting low res feedback or worse just wing it, either way is just adding a lot of debt and things slow down quickly .