upvote
When I was really early in my career, a mentor told me that code review is not about catching bugs but spreading context (i.e. increasing bus factor.) Catching bugs is a side effect, but unless you have a lot of people review each pull request, it's basically just gambling.

The more expensive and less sexy option is to actually make testing easier (both programmatically and manually), write more tests and more levels of tests, and spend time reducing code complexity. The problem, I think, is people don't get promoted for preventing issues.

reply
This depends on the industry. I work on industrial machine control software, and we spend a huge amount of time on tests. We have to for some parts (human safety crtitical), but other parts would just be expensive if they failed (loss of income for customers, and possibly damaged equipment).

The key to making this scalable is to make as few parts as possible critical, and make the potential bad outcomes as benign as possible. (This lets you go to a lower rating in whatever safety standard applies to your industry.) You still need tests for the less critical parts though, while downtime is better than injury, if you want to sell future machines to your customers you need to have a good track record. At least if you don't want to compete on cost.

reply
> make as few parts as possible critical, and make the potential bad outcomes as benign as possible

This is a good lesson for anyone I think. Definitely something I’m going to think more about. Thanks for sharing!

reply
One of the major things code review does is prevent that one guy on your team who is sloppy or incompetent from messing up the codebase without singling him out.

If you told someone "I don't trust you, run all code by me first" it wouldn't go well. If you tell them "everyone's code gets reviewed" they're ok with it.

reply
> people don't get promoted for preventing issues.

they do - but only after a company has been burned hard. They also can be promoted for their area being enough better that everyone notices.

still the best way to a promotion is write a major bug that you can come in at the last moment and be the hero for fixing.

reply
That could work but plenty of quiet heros weren’t promoted for fixing critical bugs.
reply
They fixed it too soon. You have to wait until the effect is visible on someone's dashboard somewhere.
reply
Goodhart's Law strikes again... "When a measure becomes a target, it ceases to be a good measure."
reply
You have to make sure it doesn't arrive at you before it is on the dashboard. Otherwise you are why it is blowing up the time to fix a bug metric. Unless you can make the problem so obscure other smart people asked to help you can't figure it out thus making you look bad.
reply
That is in no way guaranteed. Sometimes finding too many security issues makes you unpopular.

Two years afterward, we got hit with ransomware. And obviously "I told you so" isn't a productive discussion topic at that point.

reply
That's not preventing the issue, though. The closest you can get to this is to have another competitor be burned hard and demonstrate how your code base has the exact same issue. But even that isn't guaranteed. "that can't happen here" is a hard mindset to disrupt unless you yourself are already a C suite.
reply
I think of code review more about ensuring understandability. When you spend hours gathering context, designing, iterating, debugging, and finally polishing a commit, your ability to judge the readability of your own change has been tainted by your intimate familiarity with it. Getting a fresh pair of eyes to read it and leave comments like "why did you do it this way" or "please refactor to use XYZ for maintainability", you end up with something more that will be easier to navigate and maintain by the junior interns who will end up fixing your latent bugs 5 years later.
reply
Alternately, have a small team where you trust everyone.
reply
> The problem, I think, is people don't get promoted for preventing issues.

cleaning up structural issues across a couple orgs is a senior => principal promo ive seen a couple of times

reply
Expert reviews are just about the only thing that makes AI generated code viable, though doing them after the fact is a bit sketchy, to be efficient you kinda need to keep an eye on what the model is doing as its working.

Unchecked, AI models output code that is as buggy as it is inefficient. In smaller green field contexts, it's not so bad, but in a large code base, it's performs much worse as it will not have access to the bigger picture.

In my experience, you should be spending something like 5-15X the time the model takes to implement a feature on reviewing and making it fix its errors and inefficiencies. If you do that (with an expert's eye), the changes will usually have a high quality and will be correct and good.

If you do not do that due dilligence, the model will produce a staggering amount of low quality code, at a rate that is probably something like 100x what a human could output in a similar timespan. Unchecked, it's like having a small army of the most eager junior devs you can find going completely fucking ape in the codebase.

reply
If you spend 5-15x the time reviewing what the LLM is doing, are you saving any time by using it?
reply
No, but that's the crux of the AI problem in software. Time to write code was never the bottleneck. AI is most useful for learning, either via conversation or by seeing examples. It makes writing code faster too, but only a little after you take into account review. The cases where it shines are high-profile and exciting to managers, but not common enough to make a big difference in practice. E.g AI can one-shot a script to get logs from a paginated API, convert it to ndjson, and save to files grouped by week, with minimal code review, but only if I'm already experienced enough to describe those requirements, and, most importantly, that's not what I'm doing every day anyway.
reply
I'm finding it in some cases I'm dealing with even more code given how much code AI outputs. So yeah, for some tasks I find myself extremely fast but for others I find myself spending ungodly amounts of time reviewing the code I never wrote to make sure it doesn't destroy the project from unforseen convincing slop.
reply
A related Dirty Secret that's going to become clear from all this is that a very large proportion of code in the wild (yes, even in 2026—maybe not in FAANG and friends, IDK, but across all code that is written for pay in the entire economy) has limited or no automated test coverage, and is often being written with only a limited recorded spec that's usually fleshed out only to the degree needed (very partial) as a given feature is being worked on.

What do the relatively hands-off "it can do whole features at a time" coding systems need to function without taking up a shitload of time in reviews? Great automated test coverage, and extensive specs.

I think we're going to find there's very little time-savings to be had for most real-world software projects from heavy application of LLMs, because the time will just go into tests that wouldn't otherwise have been written, and much more detailed specs that otherwise never would have been generated. I guess the bright-side take of this is that we may end up with better-tested and better-specified software? Though so very much of the industry is used to skipping those parts, and especially the less-capable (so far as software goes) orgs that really need the help and the relative amateurs and non-software-professionals that some hope will be able to become extremely productive with these tools, that I'm not sure we'll manage to drag processes & practices to where they need to be to get the most out of LLM coding tools anyway. Especially if the benefit to companies is "you will have better tests for... about the same amount of software as you'd have written without LLMs".

We may end up stuck at "it's very-aggressive autocomplete" as far as LLMs' useful role in them, for most projects, indefinitely.

On the plus side for "AI" companies, low-code solutions are still big business even though they usually fail to deliver the benefits the buyer hopes for, so there's likely a good deal of money to be made selling companies LLM solutions that end up not really being all that great.

reply
> better-specified software

Code is the most precise specification we have for interfacing with computers.

reply
There are some cases where AI is generating binary machine code, albeit small amounts. What do we have when we don't have the code?
reply
Machine code is still code, even if the representation is a bit less legible than the punch cards we used to use.
reply
deleted
reply
You’re missing the point of a spec
reply
Re. productivity, if LLM's are a genuine boost with 1/3 of the work, neutral 1/3 of the time, and actually worse 1/3 of the time, it's likely we aren't really seeing performance improvements as 1) people are using them for everything and b) we're still learning how to best use them.

So I expect over time we will see genuine performance improvements, but Amdahl's law dictates it won't be as much as some people and ceo's are expecting.

reply
Bingo. Hopefully there are some business opportunities for us in that truth.
reply
> because the time will just go into tests that wouldn't otherwise have been written

Writing tests to ensure a program is correct is the same problem as writing a correct program.

Evaluating conformance is a different category of concern from ensuring correctness. Tests are about conformance not correctness.

Ensuring correct programs is like cleaning in the sense that you can only push dirt around, you can't get rid of it.

You can push uncertainty around and but you can't eliminate it.

This is the point of Gödel's theorem. Shannon's information theory observes similar aspects for fidelity in communication.

As Douglas Adams noted: ultimately you've got to know where your towel is.

reply
A competent programmer proves the program he writes correct in his head. He can certainly make mistakes in that, but it’s very different from writing tests, because proofs abstract (or quantify) over all states and inputs, which tests cannot do.
reply
deleted
reply
These companies don't care about saving time or lowering operating costs, they have massive monopolies to subsidize their extremely poor engineering practices with. If the mandate is to force LLM usage or lose your job, you don't care about saving time; you care about saving your job.

One thing I hope we'll all collectively learn from this is how grossly incompetent the elite managerial class has become. They're destroying society because they don't know what to do outside of copying each other.

It has to end.

reply
The submitter with their name on the Jira ticket saves time, the reviewer who has to actually verify the work loses a lot of time and likely just lets issues slip through.
reply
To be honest, some times it's still beneficial.

For fairly straightforward changes it's probably a wash, but ironically enough it's often the trickier jobs where they can be beneficial as it will provide an ansatz that can be refined. It's also very good at tedious chores.

reply
And spotting stuff in review! Sometimes it’s false positives but on several occasions I’ve spent ~15-30 minutes teaching-reviewing a PR in person, checked afterwards and it matched every one of the points.
reply
Some, but not very much. Writing code is hard. Ai will do a lot of tedious code that you procrastinate writing.
reply
Also when you are writing code yourself you are implicitly checking it whilst at the back of your mind retaining some form of the entire system as a whole.

People seem to gloss over this... As a CEO if people don't function like this I'd be awake at night sweating.

reply
That’s the reverse-centaur issue I see: humans are not great at repetitive nuanced similar seeming tasks, putting the onus on humans to retroactively approve high volumes of critical code has them managing a critical failure mode at their weakest and worst. Automated reviews should be enhancing known good-faith code, manual reviews of high volume superficially sound but subversive code is begging for issues over time.

Which results the software engineering issue I’m not seeing addressed by the hype: bugs cost tens to hundreds of times their coding cost to resolve if they require internal or external communication to address. Even if everyone has been 10x’ed, the math still strongly favours not making mistakes in the first place.

An LLM workflow that yields 10x an engineer but psychopathically lies and sabotages client facing processes/resources once a quarter is likely a NNPP (net negative producing programmer), once opportunity and volatility costs are factored in.

reply
> Even if everyone has been 10x’ed, the math still strongly favours not making mistakes in the first place

The math depends on importance of the software. A mistake in a typical CRUD enterprise app with 100 users has zero impact on anything. You will fix it when you have time, the important thing is that the app was delivered in a week a year ago and was solving some problem ever since. It has already made enormous profit if you compare it with today’s (yesterday’s ?) manual development that would take half a year and cost millions.

A mistake in a nuclear reactor control code would be a total different thing. Whatever time savings you made on coding are irrelevant if it allowed for a critical bug to slip through.

Between the two extremes you thus have a whole spectrum of tasks that either benefit or lose from applying coding with LLMs. And there are also more axes than this low to high failure cost, which also affect the math. For example, even non-important but large app will likely soon degrade into unmanageable state if developed with too little human intervention and you will be forced to start from scratch loosing a lot of time.

reply
I have found ai extreemly good at finding all those really hard bugs though. Ai is a greater force multiplier when there is a complex bug than in gneen field code.
reply
Sortof. I work on a system too large for anyone to know the whole thing. Often people who don't know each other do something that will break the other. (Often because of the number of different people - most individuals go years between this)
reply
No I’m keeping up with the system as a whole because I’m always working at a system level when I’m using AI instead of worrying about the “how”
reply
No you’re not. The “how” is your job to understand, and if you don’t you’ll end up like the devs in the article.

We as an industry have been able to offload a lot of “how” via deterministic systems built by humans with expert understanding. LLMs give you the illusion of this.

reply
No in my case the “how” is

1. I spoke to sales to find out about the customer

2. I read every line of the contract (SOW)

3. I did the initial requirements gathering over a couple of days with the client - or maybe up to 3 weeks

3. I designed every single bit of AWS architecture and code

4. I did the design review with the client

5. I led the customer acceptance testing

> We as an industry have been able to offload a lot of “how” via deterministic systems built by humans with expert understanding. LLMs

I assure you the mid level developers or god forbid foreign contractors were not “experts” with 30 years of coding experience and at the time 8 years of pre LLM AWS experience. It’s been well over a decade - ironically before LLMs - that my responsibility was only for code I wrote with my own two hands

reply
Yes, and trusting an LLM here is not a good idea. You know it will make important mistakes.

I’m not saying trusting cheap devs is a good idea either. I do think cheap devs are actually at risk here.

reply
I am not “trusting” either - I’m validating that they meet the functional and non functional requirements just like with an LLM. I have never blindly trusted any developer when my neck was the one on the line in front of my CTO/director or customer.

I didn’t blindly trust the Salesforce consultants either. I also didn’t verify every line of oSql (not a typo) they wrote.

reply
Actually, it's SOQL. I did Salesforce crap for many years.
reply
deleted
reply
> Expert reviews are just about the only thing that makes AI generated code viable

I disagree, in the sense that an engineer who knows how to work with LLMs can produce code which only needs light review.

* Work in small increments

* Explicitly instruct the LLM to make minimal changes

* Think through possible failure modes

* Build in error-checking and validation for those failure modes

* Write tests which exercise all paths

This is a means to produce "viable" code using an LLM without close review. However, to your point, engineers able to execute this plan are likely to be pretty experienced, so it may not be economically viable.

reply
By the time you're working in increments small enough that it doesn't introduce significant issues, you really might as well write the code yourself.
reply
That's not my experience — I'm significantly faster while guiding an LLM using this methodology.

The gains are especially notable when working in unfamiliar domains. I can glance over code and know "if this compiles and the tests succeed, it will work", even if I didn't have the knowledge to write it myself.

reply
> I can glance over code and know "if this compiles and the tests succeed, it will work", even if I didn't have the knowledge to write it myself.

... Errr... Yeah, that's not a great approach, unless you are defining 'work' extremely vaguely.

reply
Haha I have usually found myself on the conservative side of any engineering team I’ve been on, and it’s refreshing to catch some flak for perceived carelessness.

I still make an effort to understand the generated code. If there’s a section I don’t get, I ask the LLM to explain it.

Most of the time it’s just API conventions and idioms I’m not yet familiar with. I have strong enough fundamentals that I generally know what I’m trying to accomplish and how it’s supposed to work and how to achieve it securely.

For example, I was writing some backend code that I knew needed a nonce check but I didn’t know what the conventions were for the framework. So I asked the LLM to add a nonce check, then scanned the docs for the code it generated.

reply
> I'm significantly faster while guiding an LLM using this methodology.

https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...

>When developers are allowed to use AI tools, they take 19% longer to complete issues—a significant slowdown that goes against developer beliefs and expert forecasts. This gap between perception and reality is striking: developers expected AI to speed them up by 24%, and even after experiencing the slowdown, they still believed AI had sped them up by 20%.

If we're being honest with ourselves, it's not making devs work faster. It at best frees their time up so they feel more productive.

reply
Fair point. I have definitely caught myself taking longer to revise a prompt repeatedly after the AI gets things wrong several times than it would have taken to write the code myself.

I'd like to think that I have this under control because the methodology of working in small increments helps me to recognize when I've gotten stuck in an eddy, but I'll have to watch out for it.

I still maintain that the LLM is saving me time overall. Besides helping in unfamiliar domains, it's also faster than me at leaf-node tasks like writing unit tests.

reply
How long will that 19% hold as models grow in capability?
reply
I'm a bit tired of waiting for "tomorrow", so I'll just live in today's world. We'll burn that bridge when we get to it.
reply
That's where the Gell-Mann amnesia will get you though. As much it trips up on the domains you're familiar with, it also trips up in unfamiliar domains. You just don't see it.
reply
You're not telling me anything I don't know already. Only a person who accepts that they're fallible can execute this methodology anyway, because that's the kind of mentality that it takes to think through potential failure modes.

Yes, code produced this way will have bugs, especially of the "unknown unknown" variety — but so would the code that I would have written by hand.

I think a bigger factor contributing to unforeseen bugs is whether the LLM's code is statistically likely to be correct:

* Is this a domain that the LLM has trained on a lot? (i.e. lots of React code out there, not much in your home-grown DSL)

* Is the codebase itself easy to understand, written with best practices, and adhering to popular conventions? Code which is hard for humans to understand is also hard for an LLM to understand.

reply
Right, I think the latter part is my concern with AI generated code. Often it isn't easy to read (or as easy to read as it could be), and the harder it is to navigate, the more code problems the AI model introduces.

It introduces unnecessary indirection, additional abstractions, fails to re-use code. Humans do this too, but AI models can introduce this type of architectural rot much faster (because it's so fast), and humans usually notice when things start to go off the rails, whereas an AI model will just keep piling on bad code.

reply
I agree that under default settings, LLMs introduce way too many changes and are way too willing to refactor everything. I was only able to get the situation under control by adding this standing instruction:

    ---
    applyTo: '**'
    ---
    By default:
    Make the smallest possible change.
    Do not refactor existing code unless I explicitly ask.
Under this, Claude Opus at least produces pretty reliable code with my methodology even under surprisingly challenging circumstances, and recent ChatGPTs weren't bad either (though I'm no longer using them). Less powerful LLMs struggle, though.
reply
Besides building web apps for internal use, I’m never going to let AI architect something I’m not familiar with. I could care less whether it uses “clean code” or what design pattern it uses. Meaning I will go from an empty AWS account to fully fledged app + architecture because I’ve been coding for 30 years and dealing with every book and cranny of AWS for a decade.

But I would never do the same for Azure.

reply
I tend to agree. I spent a lot of time revising skills for my brownfield repo, writing better prompts to create a plan with clear requirements, writing a skill/command to decompose a plan, having a clear testing skill to write tests and validate, and finally having a code reviewer step using a different model (in my case it's codex since claude did the development). My last PR was as close to perfect as I have got so far.
reply
Just lead with “You are an expert software engineer…”, easy!
reply
Sadly, the way people become expert in a codebase is through coding. The process of coding is the process of learning. If we offload the coding to AI tools we will never be as expert in the codebase, its complexity, its sharp corners, or its unusual requirements. While you can apply general best practices for a code review you can never do as much as if you really got your hands dirty first.

"Seniors will do expert review" will slowly collapse.

reply
In my experience, inefficient code is rarely the issue outside of data engineering type ETL jobs. It’s mostly architectural. Inefficient code isn’t the reason your login is taking 30 seconds. Yes I know at Amazon/AWS scale (former employee) every efficiency matters. But even at Salesforce scale, ringing out every bit of efficiency doesn’t matter.

No one cares about handcrafted artisanal code as long as it meets both functional and non functional requirements. The minute geeks get over themselves thinking they are some type of artists, the happier they will be.

I’ve had a job that requires coding for 30 years and before ther I was hobbyist and I’ve worked for from everything from 60 person startups to BigTech.

For my last two projects (consulting) and my current project, while I led the project, got the requirements, designed the architecture from an empty AWS account (yes using IAC) and delivered it. I didn’t look at a line of code. I verified the functional and non functional requirements, wrote the hand off documentation etc.

The customer is happy, my company is happy, and I bet you not a single person will ever look at a line of code I wrote. If they do get a developer to take it over, the developer will be grateful for my detailed AGENTS.md file.

reply
It’s not about hand crafted code or even code performance.

We know from experimentation that agents will change anything that isn’t nailed down. No natural language spec or test suite has ever come close to fully describing all observable behaviors of a non-trivial system.

This means that if no one is reviewing the code, agents adding features will change observable behaviors.

This gets exposed to users as churn, jank, and broken work flows.

reply
Thats easy enough to prevent with modular code that’s what “plan mode” is for. But you probably never worked with a bunch of C# developers using R#
reply
1. Preventing agents from crossing boundaries, creating implicit and explicit dependencies, and building false layers requires much more human control over every PR and involvement with the code than you seem to espouse.

2. Assuming that techniques that work with human developers that have severely impaired judgement but are massively faster at producing code is a bad idea.

3. There’s no way you have enough experience with maintaining code written in this way to confidently hand wave away concerns.

reply
Absolutely no one in the value chain cares about “how many layers of abstractions your code has - not your management or your customers. They care about functional and none functional requirements
reply
Of course they don’t. Please reread what I said, give it the slightest bit of thought, and re-respond if you want a response from me.
reply
By definition, coding agents are right now the worse they will ever be and the industry as a whole by definition is the least experienced it will ever be at using then.

So many people on HN are so insulted that the people who put money in our bank accounts and in some cases stock in our brokerage accounts ever cared about their bespoke clean code, GOF patterns and they never did. LLM just made it more apparent.

It’s always been dumb for PR to be focused on for loops vs while loops instead of focusing on whether functional and non functional requirements are met

reply
Wow you have completely lost the plot. It’s like you’re a bot that’s mixing up who he’s replying to.
reply
Just maybe you aren’t making the strong argument you think you are making
reply
"No one cares about handcrafted artisanal code as long as it meets both functional and non functional requirements"

Speak for yourself. I don't hire people like you.

reply
And guess what? You probably don’t pay as much as I make now either…

Even in late 2023 with the shit show of the current market, I had no issues having multiple offers within three weeks just by reaching out to my network and companies looking for people with my set of skills.

reply
I field a small team of experts who are paid upwards of a million GBP in cold-hard cash in London. Not stock. Cash.

You sound like a bozo, I can sniff it through my screen.

reply
This sounds like a place I want to work at.
reply
[flagged]
reply
Yes because I didn’t check to see if Claude code used a for loop instead of a while loop? Or that it didn’t use my preferred GOF pattern and didn’t use what I read in “Clean Code”?

Guess what? I also stopped caring how registers are used and counting clock cycles in my assembly language code like it’s the 80s and I’m still programming on a 1Mhz 65C02

reply
I can see the argument both ways. Some code is just not worth looking at...

But do you look at any of the AI output? Or is it just "it works, ship it"?

reply
My last project was basically an ETL implementation on AWS starting with an empty AWS account and a internal web admin site that had 10 pages. I am yada yada yadaing over a little bit.

What I checked.

1. The bash shell scripts I had it write as my integration test suite

2. To make sure it wasn’t loading the files into Postgres the naive way -loading the file from S3 and doing bulk inserts instead of using the AWS extension that lets it load directly from S3. It’s the differ xe between taking 20 minutes and 20 seconds.

3. I had strict concurrency and failure recovery requirements. I made sure it was done the right way.

4. Various security, logging, log retention requirements

What I didn’t look at - a line of the code for the web admin site. I used AWS Cognito for authentication and checked to make sure that unauthorized users couldn’t use the website. Even that didn’t require looking at the code - I had automated tests that tested all of the endpoints.

reply
This all makes sense.

I've witnessed human developers produce incredibly convoluted, slow "ETL pipelines" that took 10+ minutes to load single digit megabytes of data. It could've been reduced to a shell script that called psql \copy.

reply
> For a person (senior or otherwise) to examine code or configuration with the granularity required to verify that it even approximates the result of their own level of experience, even only in terms of security/stability/correctness, requires an amount of time approaching the time spent if they had just done it themselves.

Hell, often it feels slower/worse. Foreign code is easily confusing at first, which slows you down - and bad code quickly gets bewildering and sends you down paths of clarifications that waste time.

reply
So many times I get AI generated PRs from juniors where I don't feel comfortable with the code, I wouldn't do it like this myself, but I can't strictly find faults that I can reject the PR with. Usually it's just a massive amount of code being generated which is extremely difficult to review, much harder than it was for the submitter to generate and send it for review.

Then often it blows up in production. Makes me almost want to blanket reject PRs for being too difficult to understand. Hand written code almost has an aversion to complexity, you'd search around for existing examples, libraries, reusable components, or just a simpler idea before building something crazy complex. While with AI you can spit out your first idea quickly no matter how complex or flawed the original concept was.

reply
Rejecting a PR for being overly complicated or difficult to understand is valid. Breaking a large change into understandable pieces is an important skill both for making changes reviewable as well as helping the author understand the problem.
reply
> requires an amount of time approaching the time spent if they had just done it themselves

It's actually often harder to fix something sloppy than to write it from scratch. To fix it, you need to hold in your head both the original, the new solution, and calculate the difference, which can be very confusing. The original solution can also anchor your thinking to some approach to the problem, which you wouldn't have if you solve it from scratch.

reply
In fairness though, it does give you good practice for the essential skill of maintaining / improving an old codebase.
reply
Sloppy code that has been around for a while works. It likely has support for edge cases you forgot about. Often the sloppyness is because of those edge cases.
reply
That's the incidental (necessary) vs accidental (avoidable) complexity distinction. But I don't think it makes it any easier to deal with.
reply
those are different things. Often you don't plan for all the necessary things and so it doesn't fit in - even though a better design evists that would have it fit in neater - but only years latter do you see it and getting there is now a massive effort you can't afford. The result looks sloppy because on hindsight right is obvious
reply
Right, code reviews should already have been happening with human written junior code.

If AI is a productivity boost and juniors are going to generate 10x the PRs, do you need 10x the seniors (expensive) or 1/10th the juniors (cost save).

A reminder that in many situations, pure code velocity was never the limiting factor.

Re: idiot prooofing I think this is a natural evolution as companies get larger they try to limit their downside & manage for the median rather than having a growth mindset in hiring/firing/performance.

reply
Seniors are going to need to hold Juniors to a high bar for understanding and explaining what they are committing. Otherwise it will become totally soul destroying to have a bunch of juniors submitting piles of nonsense and claiming they are blocked on you all the time.
reply
Make them first go through an AI reviewer that is informed by the code base's standards.
reply
This was challenging enough pre AI. Now that everybody has an AI slop button, the life of an effective code reviewer just got so much more miserable.
reply
deleted
reply
I.e. senior review is valuable, but it does not make bad code good.

I suspect that isn't the goal.

Review by more senior people shifts accountability from the Junior to a Senior, and reframes the problem from "Oh dear, the junior broke everything because they didn't know any better" to "Ah, that Senior is underperforming because they approved code that broke everything."

reply
> Review by a senior is one of the biggest "silver bullet" illusions managers suffer from

Especially in a big co like Amazon, most senior engineers are box drawers, meeting goers, gatekeepers, vision setters, org lubricants, VP's trustees, glorified product managers, and etc. They don't necessarily know more context than the more junior engineers, and they most likely will review slowly while uncovering fewer issues.

reply
This is also why I think we will enter a world without Jr's. The time it takes for a Senior to review the Jr's AI code is more expensive than if the Sr produced their own AI code from scratch. Factor in the lack of meetings from a Sr only team, and the productivity gains will appear to be massive.

Whether or not these productivity gains are realized is another question, but spreadsheet based decision makers are going to try.

reply
In this scenario, how might one become a senior without first being a junior? Seniors just pop into existence?
reply
The business leaders do not care about this yet. I think a lot of people think we already have more Seniors than we will need in the next 5-10 years.

Also - the definition of Senior will change, and a lot of current Seniors will not transition, while plenty of Juniors that put in a lot of time using code agents will transition.

reply
>while plenty of Juniors that put in a lot of time using code agents will transition.

But will they? I'm not at all convinced that babysitting an AI churning out volumes of code you don't understand will help you acquire the knowledge to understand and debug it.

reply
Some will, some will not; hiring interviews and promotion committees will take care of the rest.
reply
The bet from various industry leaders appears to be that the current generation of engineers will be the last who will ever need to think about complex systems and engineering, as the AI will just get good enough to do all of that by the time they retire.
reply
I think it’s deeper than that because it’s affected more industries than software and already started pre AI.

American corporate culture has decided that training costs are someone else’s problem. Since every corporation acts this way it means all training costs have been pushed onto the labor market. Combine that with the past few decades of “oops, looks like you picked the wrong career that took years of learning and/or 10 to 100s of thousands of dollars to acquire but we’ve obsoleted that field” and new entrants into the labor market are just choosing not to join.

Take trucking for example. For the past decade I’ve heard logistics companies bemoan the lack of CDL holders, while simultaneously gleefully talk about how the moment self driving is figured out they are going to replace all of them.

We’re going to be outpaced by countries like China at some point because we’re doing the industrial equivalent of eating our seed corn and there is seemingly no will to slow that trend down, much less reverse it.

reply
> we’re doing the industrial equivalent of eating our seed corn and there is seemingly no will to slow that trend down, much less reverse it.

I know I'm probably coming across as a lunatic lately on HN but I really do think we're on the path towards violence thanks to AI

You just cannot destroy this many people's livelihoods without backlash. It's leading nowhere good

But a handful of people are getting stupidly rich/richer so they'll never stop

reply
I don't think you're a lunatic.

If you look at the luddite rebellion they weren't actually against industrial technology like looms. They were against being told they weren't needed anymore and thrown to the wolves because of the machines.

The rich have forgotten they are made of meat and/or are planning on returning to feudalism ala Yarvin, Thiel, Musk, and co's politics.

reply
> They were against being told they weren't needed anymore and thrown to the wolves because of the machines.

I guess that makes me a modern luddite then

A software engineer luddite

A techno-luddite if you will

Maybe I have a new username

reply
It could create the right sort of incentives though. If I'm a junior and I suddenly have to take my work to a senior every time I use AI, I'm going to be much more selective about how I use it and much more careful when I do use it. AI is dangerous because it is so frictionless and this is a way to add friction.

Maybe I don't have the correct mental model for how the typical junior engineer thinks though. I never wanted to bug senior people and make demands on their time if I could help it.

reply
What you're actually going to see is seniors inundated by slop and burning out and quitting because what used to be enjoyable solving of problems has become wading through slop that took 10 minutes to generate and submit but 30+ minutes to understand and write up a critique for it.
reply
not even that. implementing this requirement could be a general work stoppage whenever the senior engineer is in all day meetings or on vacation

With a layout of 4 juniors, 5 intermediates, and 0-1 senior per team, putting all the changes through senior engineer review means you mostly wont be able to get CRs approved.

I guess it could result in forcing everyone who's sandbagging as intermediate instead of going to senior to have to get promoted?

reply
In my experience, Claude and the juniors piloting it are usually receptive to quick feedback along the lines of "This is unreasonably hard to understand, please try refactoring it this way and let me know when it's cleaner".
reply
Can I interest you in a bunch of emoji-laden comments?
reply
> Review by a senior is one of the biggest "silver bullet" illusions managers suffer from.

My manager has been urging us to truly vibe code, just yesterday saying that "language is irrelevant because we've reached the point where it works - so you don't need to see it." This article is a godsend; I'll take this flawed silver bullet any day of the week.

reply
Why only AI generated code? I wouldn’t let a junior or mid level developer’s code go into production without at least verifying the known hotspots - concurrency, security, database schema, and various other non functional requirements that only bite you in production.

I’m probably not going to review a random website built by someone except for usability, requirements and security.

reply
I’ve seen hundreds of PR’s produced by a junior and reviewed by a mid lvl go into prod. I don’t see any problem with that
reply
I didn't restrict my opinion to genAI code. I'm expressing a general thought that was relevant before AI. AI is just salient in relation to it.

I also said senior review is valuable, but I'm not 100% sure if you're implying I didn't.

reply
Senior review can definitely help, regardless if the code comes from a junior or an LLM. We've done this since the dawn of time. However, it doesn't scale and since LLM volume far exceeds what juniors can do, you end up overwhelming the seniors, who are normally overbooked anyway.

The other problem is that the type of errors LLMs make are different than juniors. There are huge sections of genuinely good code. So the senior gets "review fatigue" because so much looks good they just start rubber stamping.

I use an automated pipeline to generate code (including terraform, risking infrastructure nukes), and I am the senior reviewer. But I have gates that do a whole range of checks, both deterministic and stochastic, before it ever gets to me. Easy things are pushed back to the LLM for it to autofix. I only see things where my eyes can actually make a difference.

Amazon's instinct is right (add a gate), but the implementation is wrong (make it human). Automated checks first, humans for what's left.

reply
I seriously doubt that they think senior reviewers will meticulously hunt down and fix all the AI bugs. Even if they could, they surely don't have the time. But it offers other benefits here:

1. They can assess whether the use of AI is appropriate without looking in detail. E.g. if the AI changed 1000 lines of code to fix a minor bug, or changed code that is essential for security.

2. To discourage AI use, because of the added friction.

reply
The unwritten thing is that if you need seniors to review every single change from junior and mid-level engineers, and those engineers are mostly using Kiro to write their CRs, then what stops the senior from just writing the CRs with Kiro themselves?
reply
What a statement at the end. You are absolutely right.

I hear “x tool doesn’t really work well” and then I immediately ask: “does someone know how to use it well?” The answer “yes” is infrequent. Even a yes is often a maybe.

The problem is pervasive in my world (insurance). Number-producing features need to work in a UX and product sense but also produce the right numbers, and within range of expectations. Just checking the UX does what it’s supposed to do is one job, and checking the numbers an entirely separate task.

I don’t many folks that do both well.

reply
Going to systemically turn off your senior staff over time also. Most Senior Engineers aren't that interested in doing even more code review.
reply
Also, have massive layoffs every few months just to keep people on edge. AWS wants people to leave with RTO and badging policies, comp range shifts lower unless you have year over year ratings, and an obsessive push to force AI into every process. Top talent is leaving and will continue to leave AWS.
reply
The goal of Sr code review is not to make the code better, it's to make the author better.
reply
Agree but even broader: authors. I always viewed reviews as targeting Brook's less famous findings about the optimal team size being one, and asking how can we get better at building systems too big for the individual. I think code review is about shared, consistent understanding with catching bugs a nice side effect (or justification for the bean counters).
reply
I agree, made (mostly) that point in my top level comment. Code reviews (both in the normal GitHub flow, but also small meetings, design reviews, etc) all help to tie the team together and improve quality.
reply
That's not going to work when the author is an LLM
reply
Deming's point 3 (of 14): Cease dependence on inspection to achieve quality. Eliminate the need for massive inspection by building quality into the product in the first place.
reply
Don’t forget that this auto generated code will have subtle bugs and feels complete at the outset
reply
Senior reviews are useful, but as I understand it, Amazon has a fairly high turnover rate, so I wonder just how many seniors with deep knowledge of the codebase they could possibly have.
reply
From engineers are interchangeable to high turnover are decisions that the company took. The payback time always comes at some point.
reply
>requires an amount of time approaching the time spent if they had just done it themselves.

I would actually say having at least 2 people on any given work item should probably be the norm at Amazon's size if you also want to churn through people as Amazon does and also want quality.

Doing code reviews are not as highly valued in terms of incentives to the employees and it blocks them working on things they would get more compensation for.

reply
What stops the senior from using AI to review the AI generated code the junior published?
reply
That’s something that the junior can do. What companies want to do is put responsibility on someone who has more knowledge and skin in the game
reply
Other than “don’t hire idiots”, what is the solution to this problem? I agree with you, and this particular systems management issue is not constrained to software.
reply
I don't know.

We need smart people at every layer. If leadership isn't in that category, it spreads to all layers.

I don't know how we defeat capitalism to incentivize smart leadership. It's fundamentally opposed to market forces.

reply
the outcome of the review isn't just that the code gets shipped, it's knowledge transfer from the senior engineer to the junior engineers that then creates more senior engineers
reply
Reviewing code changes (generally) takes more time than writing code changes for a pretty significant chunk of engineers. If we're optimizing slop code writing at the expense of more senior's time being funneled into detailed reviews then we're _doing it wrong_.
reply
LGTM
reply
Who said PR reviews need to solve all the things and result in proof against idiots?

So you're saying that peer reviews are a waste of time and only idiots would use/propose them?

reply
None of that, sorry if I wasn't clear.

To partially clarify: "Idiot proof" is a broad concept that here refers specifically to abstraction layers, more or less (e.g. a UI framework is a little "idiot proof"; a WYSIWYG builder is more "idiot proof"). With AI, it's complicated, but bad leadership is over-interpreting the "idiot proof" aspects of it. It's a phrase, not an insult to users of these tools.

reply