I think this is the main point where many people’s work differs. Most of my work I know roughly what needs changing and how things are structured but I jump between codebases often enough that I can’t always remember the exact classes/functions where changes are needed. But I can vaguely gesture at those specific changes that need to be made and have the AI find the places that need changing and then I can review the result.
I rarely get the luxury of working in a single codebase for a long enough period of time to get so familiar with it that I can jump to particular functions without much thought. That means AI is usually a better starting point than me fumbling around trying to find what I think exists but I don’t know where it is.
I'm thinking about how to solve the problem and how to express it in the programming language such that it is easy to maintain. Getting someone/something else to do that doesn't help me.
But different strokes for different folks, I suppose.
And I have the AI deal with "knowing how to do it" as well. Often it's slower to have it do enough research to know how to do it, but my time is more expensive than Claude's time, and so as long as I'm not sitting around waiting it's a net win.
Gemini running an benchmark- everything ran smoothly for an hour. But on verification it had hallucinated the model used for judging, invalidating the whole run.
Another task used Opus and I manually specified the model to use. It still used the wrong model.
This type of hallucination has happened to me at least 4-5 times in the past fortnight using opus 4.6 and gemini-3.1-pro. GLM-5 does not seem to hallucinate so much.
So if you are not actively monitoring your agent and making the corrections, you need something else that is.
Also instead of just prompting, having it write a quick summary of exactly what it will do where the AI writes a plan including class names branch names file locations specific tests etc. is helpful before I hit go, since the code outline is smaller and quicker to correct.
That takes more wall clock time per agent, but gets better results, so fewer redo steps.
Hopefully there is some of lint process to catch my human hallucinations and typos.
This is exactly the sort of future I'm afraid of. Where the people who are ostensibly hired to know how stuff works, out source that understanding to their LLMs. If you don't know how the system works while building, what are you going to when it breaks? Continue to throw your LLM at it? At what point do you just outsource your entire brain?
I’m not sure it’s really true in practice yet, but that would certainly be the claim.
Because, after they're done/have finished executing, I guess you still have to "check" their output, integrate their results into the bigger project they're (supposedly) part of etc, and for me the context-switching required to do all that is mentally taxing. But maybe this only happens because my brain is not young enough, that's why I'm asking.