If that's true, then I would think the emphasis in code review should be more on test quality and verifying that the spec is captured accurately, and as you suggest, the actual implementation is less important.
You should be planning out the tests to properly exercise the spec, and ensuring those tests actually do what the spec requires. AI can suggest more tests (but be careful here, too, because a ballooned test suite slows down CICD), but it should never be in charge of them completely.
Problem is, the way I've been trained to test isn't exactly antagonistic. QA does that kind of thing. Programmers writing tests are generally rather doing spot checks that only make sense if the code is generally understood and trustworthy. Code LLMs produce is usually broken in subtle, hard to spot ways.
Agree with everything else you said except this. In my opinion, this assumes code becomes more like a consumable as code-production costs reduce. But I don't think that's the case. Incorrect, but not visibly incorrect, code will sit in place for years.
Yeah, I'm not sure I agree with what I said there myself!
> Incorrect, but not visibly incorrect, code will sit in place for years.
If you let incorrect code sit in place for years I think that suggests a gap in your wider process somewhere.
I'm still trying to figure out what closing those gaps looks like.
The StrongDM pattern is interesting - having an ongoing swarm of testing agents which hammer away at a staging cluster trying different things and noting stuff that breaks. Effectively an agent-driven QA team.
I'm not going to add that to the guide until I've heard it working for other teams and experienced it myself though!
So, you have a code writing droid that is aligned towards writing good clean code that humans can read. Then you have an implementation droid that goes into actually launching and running the code and is aligned with business needs and expenses. And you have a QA droid that stress tests the code and is aligned with the hacker mindset and is just slightly evil, so to speak.
Each droid is working together to make good code, but also are independent and adversarial in the day to day.
I don't care how cheap it is to replace the incorrect code when it's modifying my bank account or keeping my lights on.
I.e. have a `planning/designs/unbuilt/...` folder that contains markdown descriptions of features/changes that would have gotten a PR. Now do the review at the design level.