This rhymes a lot with the Mythical Man Month. There's some corollary Mythical Machine Month thing going on with agent developed code at the moment.
When you try to throw more agents at the problem or even more verification layer, you just kill your agility even if they would still be able to work
I asked Claude (Opus High Effort) and pasted in all the logs. I went back and forth and it very confidently made over 20 separate changes in the repo, none of which fixed the issue. Eventually I stepped in and figured out it was a versioning issue.
I fear what would happen if I ran “10 agents for 10 days” on this simple issue.
All of these things human do, and i don't think we can attribute it directly to language itself, its attention and context and we both have the same issues.
The problem is maximizing code generated per token spent. This model of "efficiency" is fundamentally broken.
Or you're working in a trendy, modern open-plan office and between the noise from the salespeople nearby talking loudly to customers on their speakerphones, some coworkers talking about their medical issues, and the guy right next to you talking loudly to himself in a different language, you're unable to concentrate at all on your programming task.
unless anthropic tomorrow comes in and takes ownership all the code claude generates, that is not changing..
What I might believe though is that agents might make rewrites a lot more easy.
“Now we know what we were trying to build - let’s do it properly this time!”
And of course, make the case that it actually needs a rewrite, instead of maintenance. See also second-system effect.
Yes, but even here one needs some oversight.
My experiments with Codex (on Extra High, even) was that a non-zero percentage of the "tests" involved opening the source code (not running it, opening it) and regexing for a bunch of substrings.
"The AI said so ..."
Not only is it difficult to verify, but also the knowledge your team had of your messy codebase is now mostly gone. I would argue there is value in knowing your codebase and that you can't have the same level of understanding with AI generated code vs yours.
I wonder if AI will avoid the inevitable pitfalls their human predecessors make in thinking "if I could just rewrite from scratch I'd make a much better version" (only to make a new set of poorly understood trade offs until the real world highlights them aggressively)
When the management recognize a tech debt, often it is too late that nobody understand the full requirement or know how things are supposed to work.
The AI agent will just make the same mistake human would make -- writing some half ass code that almost work but missing all sorts of edge case.
More modular code, strong typing, good documentation... Humans are bad at keeping too much in the short-term memory, and AI is even worse with their limited context window.