- Proliferation of utils/helpers when there are already ones defined in the codebase. Particularly a problem for larger codebases
- Tests with bad mocks and bail-outs due to missing things in the agent's runtime environment ("I see that X isn't available, let me just stub around that...")
- Overly defensive off-happy-path handling, returning null or the semantic "empty" response when the correct behavior is to throw an exception that will be properly handled somewhere up the call chain
- Locally optimal design choices with very little "thought" given to ownership or separation of concerns
All of these can pretty quickly turn into a maintainability problem if you aren't keeping a close eye on things. But broadly I agree that line-per-line frontier LLM code is generally better than what humans write and miles better than what a stressed-out human developer with a short deadline usually produces.
To add to this list:
- Duplicate functions when you've asked for a slight change of functionality (eg. write_to_database and write_to_database_with_cache), never actually updating all the calls to the old function so you have a split codebase.
- On a similar vein, the backup code path of "else: do a stupid static default" instead of erroring, which would be much more helpful for debugging.
- Strong desires to follow architecture choices it was trained on, regardless of instruction. It might have been trained on some presumably high quality, large and enterprise-y codebases, but I'm just trying to write a short little throwaway program which doesn't need the complexity. KISS seems anathema to coding agents.