upvote
Absolutely. I've got a nice multi-paragraph prompt on hunting for subtle bugs, user expectation breaks, crufty/repeated code, useless tests (six tests that actually should be one logical flow; assertions that a ternary is still, indeed, a ternary; etc.), documentation gaps, and a few other bits and bobs.

I sick Opus, GPT5.4, and Gemini on it, have them write their own hitlists, and then have a warden Opus instance go and try to counterprove the findings, and compose a final hitlist for me, then a fresh context instance to go fix the hitlist.

They always find some little niggling thing, or inconsistency, or code organization improvement. They absolutely introduce more churn than is necessary into the codebase, but the things they catch are still a net positive, and I validate each item on the final hitlist (often editing things out if they're being overeager or have found a one in a million bug that's just not worth the fix (lately, one agent keeps getting hung up on "what if the device returns invalid serial output" in which case "yeah, we crash" is a perfectly fine response)).

reply
Mind sharing that prompt? This is one of my favorite uses for AI too, but I’m just using it to fix the stuff that’s already top of mind for me.
reply