The chasm between "Software Developer" and "Software Engineer" is getting wider. Articles like this and the comments under it give away who is an Engineer and who is just a coder.
I have found this to be very effective as well. However, it's so easy to do, I can't imagine they won't build it in.
The harnesses will improve and the loop of "self-review, judge what needs clean up, do the refactoring, repeat until clean" will get included in the one-shot. They are already doing this somewhat, they'll just get a lot better at it and as the models get faster and cheaper to run, the refactoring churn at the end of each task won't even create a noticeable delay.
I do not think the high-level "taste" knowledge that I've built up -- when to break something off into its own service, what to put in the DB vs cache vs queues vs blob storage, how to isolate important logic in pure functional layers so it can be tested and validated independently -- is any more "unlearnable" to AI than the stuff I previously considered impressive that's now one-shottable like "write a Prolog implementation from scratch".
And yes, right now you still need the architectural and system design knowledge because the LLM will fuck that up. We'll all find out if that continues being needed in the future. From what I understand about LLMs and how they work, I doubt it, but also, yeah, I doubted it would've gotten this far when I think back 2+ years ago.
Also, maybe I should be clear, I pretty much never one-shot things. My sessions with claude or other cli tools always starts with a bit of a conversation until we converge on a good plan, claude builds the code, we discuss some more, then we iterate.
If I had AI tooling at the time I'd probably be more inclined to have it both refactor / optimize the existing application, add automated regression tests etc, and use it to extract all of the features and requirements for it for a potential rebuild.
But honestly I think if that application was properly designed and factored (instead of nesting JS in HTML in strings in JS or concatenating XML from query results only for it to be converted to JSON taking up 50% of response time) its lifetime could've been extended, especially if it was then containerized into a HHVM or similar php optimizer.
But, hindsight.
One of my particular complaints is how code-gen LLMs tend to re-create the same code over and over again. Case in point, a use-case where a team name is generated from a list of team member names. The LLM re-generates this code in-line every time it needs to display the team name, rather than simply writing and reusing a utility style function.
I know I need to fix this. At this point I'm planning to just prompt something like "please list all the places where team names are generated/calculated", plus manually search through the codebase, then perform the abstraction myself. But I'm unsure how to prevent this (both this example, and other cases that could benefit from similar utility functions) continuing to occur in the future.
Once the LLM tells me "Okay, it's done, everything works" I always as it to do a thorough review, I tell it to split up the work among sub-agents with each one taking on a specific responsibility (look for code smells, look for bad architecture, review the data access model, DUPLICATE CODE, testability and unit testing, etc.)
After a certain number of revisions and reviews you'll come to accept the shortcomings it comes back. Usually there will be specific design decisions you made that the LLM keeps bringing up, once the review only brings that up and maybe some other minor issues it's time to move on.
I don't overly rely on markdown files and directions. I don't rely on tooling around it either. I just don't trust the LLM when it says "all done", tests pass, and deployment works. I make it to multiple reviews and iterations even when it thinks it's done.
Understand what you're writing. If you never build up the mental model of what the code is doing you'll never be able to discern what is slop and what isn't. There are no shortcuts.
Piling more prompts on might get you to the same end result, but without understanding you'll never know when you're there.
Organisations just don't want to deal with the accountability involved with "touching cold code". Whether it's a human or "AI agent" doesn't change the "It worked in prod, you touched it, you broke it, never touch anything again" dynamic.
But there's risk associated with every change, and it takes time to review, QA, monitor the rollout, communicate to stake holders, etc.
The refactor itself may be the smallest part of it.
Yes. In practice, this does not weigh against organisational resistance.
AI really makes it worse by adding an explicit numerical cost to doing anything.