While code quality may go up on a case by case basis, we should be mindful whether we are comparing it to our own personal baselines, or to the average code quality across the board. I.e. will the baseline regress to the mean, or raise the floor for the average coder?
I absolutely have deeply mixed feelings about these tools, the ethics associated with them, the impact on the industry, on the talent pipeline, etc.
But I also can't deny that they are incredibly powerful tools that are here to stay in one form or another.
And I say that as someone who, a year ago, was absolutely convinced that they were incremental at best and scoffed at everyone who said something like "yeah but they're so much better now!" or "they're only going to get better!"
Well, they were right, they did, and the world has changed. AI generated code is landing in the Linux kernel. 250+ security holes were found and fixed in Firefox. The impact is here and now, and it's mixed and ugly and complicated.
The amount of slop produced even in company setting is staggering and I don't like it one bit that neither the submitter nor the reviewer of the PR paid due dilligence. And I am only complaining because it then becomes my problem. So, then I have to start nagging people to clean that up. I can say with 100% certainty that the problems I face now would not have happened without LLMs.
That said, used with care, with proper supervision, with dilligence to review what LLMs did, I still think they can be and are beneficial.
I think that we are just not used to getting results of questionable quality from the tools we use. So, I am hopeful that we will learn and it will improve with time but still find myself dreading the age of the vibe coder.
I also think reading and reviewing code is a skill that connected to but very much independent of the writing of code, and the use of coding agents requires us to be far more skilled and diligent at it.
So put another way, people who were good at coding without agents may in fact be a poor fit with them, which means the entire industry is experiencing a dislocation between skills we have and skills we need, leading to extremely bimodal outcomes.
In fact, from my personal experience, going from junior to mid to senior, that was the hardest thing. Reading the code and thinking if what they did was really correct and will not have additional undesired side-effects was hard to become efficient at (it didn't help that we were working in C back then).
So, really, I think that for juniors it's actually much harder because if they want to do due dilligence they have to do the same evaluation but without the years of experience working with that code base. I can understand, even if I don't like it, that they just submit the output of the LLM for the senior to review.
But I agree fully with your last paragraph, and said something similar in a comment elsewhere where I stated my tangible bar as being a Ladybird like browser built from scratch achieving Chrome parity in six months while doing continuous stable releases with coding agents in tow. Otherwise, as you said, the jury is still out.