upvote
Right, except even when the invariants are documented agents get into trouble. Virtually every week I see the agent write strange code with multiple paths. It knows that the invariant _should_ hold, but it still writes a workaround for cases it doesn't. Something I see even more frequently is where the agent knows a certain exception shouldn't occur, but it does, so half the time it will choose to investigate and half the time it says, oh well, and catches the exception. In fact, it's worse. Sometimes it catches exceptions that shouldn't occur proactively as part of its "success at all costs" drive, and all these contingency plans it builds into the code make it very hard (even for the agent) to figure out why things go wrong.

Most importantly, this isn't hypothetical. We see that agents write programs that after some number of changes just collapse because they don't converge. They don't transition well between layers of abstractions, so they build contingencies into multiple layers, and the result is that after some time the codebase is just broken beyond repair and no changes can be made without breaking something (and because of all the contingencies, reproducing the breakage can be hard). This is why agents don't succeed in building even something as simple as a workable C compiler even with a full spec and thousands of human-written tests.

If the agents could code well, no one would be complaining. People complain because agent code becomes structurally unsound over time, and then it's only a matter of time until it collapses. Every fix and change you make without super careful supervision has a high chance of weakening the structure.

reply
Agents don't really know the whole codebase when they're writing the code, their context is way too tiny for that; and trying to grow context numbers doesn't really work well (most of it gets ignored). So they're always working piece-meal and these failures are entirely expected unless the codebase is rigorously built for modularity and the agent is told to work "in the small" and keep to the existing constraints.
reply