upvote
I'm expecting we'll likely end back up on agents making PRs, and having to review them. Either that or giving up on quality etc/dealing with very messy code. I've been trying various automated testing/linting/etc strategies, and they only work so well.
reply
That would be a nightmare. One thing is to review a PR generated by a human using AI and caring about the code; another is reviewing wild agents, especially when they make changes everywhere
reply
I'm not excited about it, but the only main way I've been able to discover LLM-isms that sneak in are

1. via seeing them glimpse by in the agents' window as its making edits (e.g. manual oversight), or 2. when running into an unexpected issue down the line.

If LLMs cannot automatically generate high quality code, it seems like it may be difficult to automatically notice when they generate bad code.

reply
> I would love to figure out how to stop that from happening automatically.

AGENTS.md

reply
> AGENTS.md

-- which will be ignored just often enough that you can never quite trust it.

reply
Yup. No matter how much you tell it to keep things simple, modular, crisp, whatever, it generates tons of garbage much too often.
reply
Btw it may be obvious but afaik claude by default only reads CLAUDE.md and not AGENTS.md
reply
And yet still less often than the average developer.
reply
I think the issue is deeper than prompts, agents.md, smart flows, etc. I think the problem is that LLMs are searchers, trained on preferring some results. So, if the dumb solution is there, and the smart solution is not there, they won't spit it out.
reply