upvote
I think Martin Fowler's "Refactoring" might give a bit of insight here. One of my take-aways after reading that book is that the specific implementation of a function is not very important if you have tests. He argues that it can sometimes be easier to completely re-write a function than to take the time to understand it - as long as you can validate that your re-write performs the same way. This mindset lines up pretty closely with how I've been using LLMs.

If that's true, then I would think the emphasis in code review should be more on test quality and verifying that the spec is captured accurately, and as you suggest, the actual implementation is less important.

reply
This is why I've been pushing back on the "just have the AI generate the tests!" mentality. Sure, let it help you, but those tests are the guarantee of quality and fit for purpose. If you vibe code them, how the hell do you know if it even does what you think it does?

You should be planning out the tests to properly exercise the spec, and ensuring those tests actually do what the spec requires. AI can suggest more tests (but be careful here, too, because a ballooned test suite slows down CICD), but it should never be in charge of them completely.

reply
A related book I've been thinking about in terms of LLMs is "Working Effectively With Legacy Code". I'd love to be able to work a lot of that advice into some kind of Skill or customized agent to help with big refactors.
reply
Oh gosh - now that you mention it, it was "Working Effectively with Legacy Code" that I was thinking of, not "Refactoring".
reply
That's my experience with agentic development so far, a lot of extra time goes into testing.

Problem is, the way I've been trained to test isn't exactly antagonistic. QA does that kind of thing. Programmers writing tests are generally rather doing spot checks that only make sense if the code is generally understood and trustworthy. Code LLMs produce is usually broken in subtle, hard to spot ways.

reply
Counter-point, developers that get used to not caring about function implementation, are going to culturally also not care as much about test implementation, making this proposed ideal impossible.
reply
with LLMs, tests cost nearly nothing of effort but provide tremendous value.
reply
And you know those tests are correct how?
reply
Look at what they are testing.
reply
> There may be an argument for leaning less on code review. When code is expensive to produce and is likely to stay in production for many years it's obviously important to review it very carefully. If code is cheap and can be inexpensively replaced maybe we can lower our review standards?

Agree with everything else you said except this. In my opinion, this assumes code becomes more like a consumable as code-production costs reduce. But I don't think that's the case. Incorrect, but not visibly incorrect, code will sit in place for years.

reply
> Agree with everything else you said except this.

Yeah, I'm not sure I agree with what I said there myself!

> Incorrect, but not visibly incorrect, code will sit in place for years.

If you let incorrect code sit in place for years I think that suggests a gap in your wider process somewhere.

I'm still trying to figure out what closing those gaps looks like.

The StrongDM pattern is interesting - having an ongoing swarm of testing agents which hammer away at a staging cluster trying different things and noting stuff that breaks. Effectively an agent-driven QA team.

I'm not going to add that to the guide until I've heard it working for other teams and experienced it myself though!

reply
This kinda gets into the idea of AIs as droids right?

So, you have a code writing droid that is aligned towards writing good clean code that humans can read. Then you have an implementation droid that goes into actually launching and running the code and is aligned with business needs and expenses. And you have a QA droid that stress tests the code and is aligned with the hacker mindset and is just slightly evil, so to speak.

Each droid is working together to make good code, but also are independent and adversarial in the day to day.

reply
These are just agents with a different name ? People have been working like that today.
reply
Theoretically I'd want a totally different model cross checking the work at some point, since much like an individual may have blind spots, so will a model.
reply
It assumes that bugs are rare and easy to fix. A look at Claude Code's issue tracker (https://github.com/anthropics/claude-code/issues) tells you that this is not so. Your product could be perpetually broken, lurching from one vibe coded bug to another.
reply
> When code is expensive to produce and is likely to stay in production for many years it's obviously important to review it very carefully. If code is cheap and can be inexpensively replaced maybe we can lower our review standards?

I don't care how cheap it is to replace the incorrect code when it's modifying my bank account or keeping my lights on.

reply
Oh, don't worry, even before AI the companies in question were already outsourcing a lot of this to the cheapest companies they could find. We are just very very lucky most of the problems incurred get caught before being foisted on the wider world.
reply
One model I've seen is moving the review stage to the designs, not the code itself.

I.e. have a `planning/designs/unbuilt/...` folder that contains markdown descriptions of features/changes that would have gotten a PR. Now do the review at the design level.

reply