undefined

points

[-]

>I’ve been on 2 failed projects that have been entirely AI generated and it’s not that agents slow down and you can just send more agents to work on projects for longer, it’s that they becoming completely unable to make any progress whatsoever, and whatever progress they do make is wrong.

This rhymes a lot with the Mythical Man Month. There's some corollary Mythical Machine Month thing going on with agent developed code at the moment.

by jwpapi2 days ago|

prev|

[-]

Same here. I have now deleted 43k and counting lines of my codebase. There is no point in putting any AI code into production anymore as it almost always uses none or the wrong abstractions.

When you try to throw more agents at the problem or even more verification layer, you just kill your agility even if they would still be able to work

by boron10061 days ago|

parent|

[-]

Case in point, just this morning I contributed a one-line change to an open source repo and the CI started failing.

I asked Claude (Opus High Effort) and pasted in all the logs. I went back and forth and it very confidently made over 20 separate changes in the repo, none of which fixed the issue. Eventually I stepped in and figured out it was a versioning issue.

I fear what would happen if I ran “10 agents for 10 days” on this simple issue.

by RALaBarge2 days ago|

prev|

[-]

The more I work with AIs (I build AI harnessing tools), the more I see similarities between the common attention failures that humans make. I forgot this one thing and it fucks everything up, or you just told me but I have too much in my mind as context that I forget that piece, or even in the case of Claude last night attesting to me while I am ordering it around that it cannot SSH into another server but I find it SSHing into said server about the 5th time I come back with traceback and it just fixes it!

All of these things human do, and i don't think we can attribute it directly to language itself, its attention and context and we both have the same issues.

by rpdillon2 days ago|

parent|

[-]

Right, but when humans are writing the code, they have learned to focus on putting downward pressure on the complexity of the system to help mitigate this effect. I don't get the sense that agents have gotten there yet.

by pphysch2 days ago|

parent|

[-]

Big business LLMs even have the opposite incentive, to churn as many tokens as possible.

by jjk71 days ago|

parent|

[-]

At least tokens are equivalent to measuring 'thinking'... I wouldn't mind if it burned 100k tokens to output a one line change to fix a bug.

The problem is maximizing code generated per token spent. This model of "efficiency" is fundamentally broken.

by shiroiuma1 days ago|

parent|

prev|

[-]

>...I see similarities between the common attention failures that humans make. I forgot this one thing and it fucks everything up, or you just told me but I have too much in my mind as context that I forget that piece

Or you're working in a trendy, modern open-plan office and between the noise from the salespeople nearby talking loudly to customers on their speakerphones, some coworkers talking about their medical issues, and the guy right next to you talking loudly to himself in a different language, you're unable to concentrate at all on your programming task.

by nishantjani102 days ago|

prev|

[-]

this is the part of the article that I did not sit well with me either. Code is agent generated, agent can debug it but will alway be human owned.

unless anthropic tomorrow comes in and takes ownership all the code claude generates, that is not changing..

by iamflimflam12 days ago|

prev|

[-]

Very much like humans when they drown in technical debt. I think the idea that a messy codebase can be magically fixed is laughable.

What I might believe though is that agents might make rewrites a lot more easy.

“Now we know what we were trying to build - let’s do it properly this time!”

by Cthulhu_2 days ago|

parent|

[-]

Potentially, yes, but as with other software, you need to know AND have (automated) verifications on what it does, exactly.

And of course, make the case that it actually needs a rewrite, instead of maintenance. See also second-system effect.

by ben_w2 days ago|

parent|

[-]

> Potentially, yes, but as with other software, you need to know AND have (automated) verifications on what it does, exactly.

Yes, but even here one needs some oversight.

My experiments with Codex (on Extra High, even) was that a non-zero percentage of the "tests" involved opening the source code (not running it, opening it) and regexing for a bunch of substrings.

by tonyedgecombe2 days ago|

parent|

prev|

[-]

>And of course, make the case that it actually needs a rewrite, instead of maintenance.

"The AI said so ..."

by tripledry2 days ago|

parent|

prev|

[-]

I'm wondering how much value there is in a rewrite once you factor in that no one understands the new implementation as well as the old one.

Not only is it difficult to verify, but also the knowledge your team had of your messy codebase is now mostly gone. I would argue there is value in knowing your codebase and that you can't have the same level of understanding with AI generated code vs yours.

by pphysch2 days ago|

parent|

[-]

The point of a rewrite is to safely delete most of that arcane knowledge required to operate the old system, by reducing the operational complexity of it.

by Ntrails2 days ago|

parent|

prev|

[-]

> “Now we know what we were trying to build - let’s do it properly this time!”

I wonder if AI will avoid the inevitable pitfalls their human predecessors make in thinking "if I could just rewrite from scratch I'd make a much better version" (only to make a new set of poorly understood trade offs until the real world highlights them aggressively)

by j16sdiz2 days ago|

parent|

prev|

[-]

It will make rewrite quicker, not "easier".

When the management recognize a tech debt, often it is too late that nobody understand the full requirement or know how things are supposed to work.

The AI agent will just make the same mistake human would make -- writing some half ass code that almost work but missing all sorts of edge case.

by bluGill2 days ago|

parent|

[-]

I was involved in a big re-write years ago. The boss finally put the old product on his desk with a sign "[boss's name]'s product owner" - that is when people asked how should this work the most common answer was exactly like the old version. 10 years latter the rewrite is a success, but it cost over a billion dollars. I have long suspected that billion dollars could have been better spend by just fixing technical debt.

by eloisant2 days ago|

parent|

prev|

[-]

That's correct, the more I work with AI the more it's obvious that all the good practice for humans is also beneficial for AI.

More modular code, strong typing, good documentation... Humans are bad at keeping too much in the short-term memory, and AI is even worse with their limited context window.

by 79522 days ago|

prev|

[-]

Is there a case for having more encapsulation? So a class and tests are defined and the LLM only works on that.

by esafak2 days ago|

prev|

[-]

Agents run fast. Not always in the right direction. They benefit from a steady hand.

by zingar1 days ago|

prev|

[-]

Any chance of a blog post covering what you saw?