Of course this requires being fortunate enough that you have one of those AI positive employers where you can spend lots of money on clankers.
I don't review every move it makes, I rather have a workflow where I first ask it questions about the code, and it looks around and explores various design choices. then i nudge it towards the design choice I think is best, etc. That asking around about the code also loads up the context in the appropriate manner so that the AI knows how to do the change well.
It's a me in the loop workflow but that prevents a lot of bugs, makes me aware of the design choices, and thanks to fast mode, it is more pleasant and much faster than me manually doing it.
One the one hand, reviewing and micromaning everything it does is tedious and unrewarding. Unlike reviewing a colleague's code, you're never going to teach it anything; maybe you'll get some skills out of it if you finds something that comes up often enough it's worth writing a skill for. And this only gets you, at best, a slight speedup over writing it yourself, as you have to stay engaged and think about everything that's going on.
Or you can just let it grind away agentically and only test the final output. This allows you to get those huge gains at first, but it can easily just start accumulating more and more cruft and bad design decisions and hacks on top of hacks. And you increasingly don't know what it's doing or why, you're losing the skill of even being able to because you're not exercising it.
You're just building yourself a huge pile of technical debt. You might delete your prod database without realizing it. You might end up with an auth system that doesn't actually check the auth and so someone can just set a username of an admin in a cookie to log in. Or whatever; you have no idea, and even if the model gets it right 95% of the time, do you want to be periodically rolling a d20 and if you get a 1 you lose everything?
Maybe I’m just weird (actually that’s a given) but I don’t mind babysitting the clanker while it works.
The agent only has access to exactly what it needs, be it an implementation agent, analysis agent, or review agent.
Makes it very easy to stay in command without having to sit and approve tons of random things the agent wants to do.
I do not allow bash or any kind of shell. I don't want to have to figure out what some random python script it's made up is supposed to do all the time.
Both OpenCode and VsCode support this. I think in ClaudeCode you can do it with skills now.
The other benefit is the MCP tool can mediate e.g. noisy build tool output, and reduce token usage by only showing errors or test failures, nothing else, or simply an ok response with the build run or test count.
So far, I have not needed to give them access to more than build tools, git, and a project/knowledge system (e.g. Obsidian) for the work I have them doing. Well and file read/write and web search.
I use Cursor but it's getting expensive lately, so I'm trying to reduce context size and move to OpenCode or something like that which I can use with some cheaper provider and Kimi 2.5 or whatever.
BTW, one tip is to look at the size of the codebase. When you see 100KLOC for a first draft of a C compiler, you know something has gone horribly wrong. I would suggest that you at least compare the number of lines the agent produced to what you think the project should take. If it's more than double, the code is in serious, serious trouble. If it's in the <1.5x range, there's a chance it could be saved.
Asking the agent questions is good - as an aid to a review, not as a substitute. The agents lie with a high enough frequency to be a serious problem.
The models don't yet write code anywhere near human quality, so they require much closer supervision than a human programmer.
You could have it build something that takes fewer lines of code, but you aren’t gonna to find much with that level of specification and guardrails.
It has about doubled my development pace. An absolutely incredible gain in a vacuum, though tiny compared to what people seem to manage without these self-constraints. But in exchange, my understanding of the code is as comprehensive as if I had paired on it, or merged a direct report's branch into a project I was responsible for. A reasonable enough tradeoff, for me.
anonu has explicitly said that they've wiped a database twice as a result of agents doing stuff. What sort of diff would help against an agent running commands, without your approval?
$ main-app git:(main) kubectl get pods | grep agent | head -n 1 | sed -E 's/[a-z]+-agent(.*)/app-agent\1/'
app-agent-656c6ff85d-p86t8 1/1 Running 0 13d
Agent is fully capable of making PR etc. if you provide appropriate tooling. It wipes DB but DB is just separate ephemeral pod. One day perhaps it will find 0-day and break out, but so far it has not done it.The diff: +8000 -4000
I also don't find the permissions it prompts for very meaningful. Permission to use a file search tool? Permission to make a web request? It's a clumsy way to slow it down enough for me to catch up.