Because it is that uneven. Some problems it nails at first go or with very little cosmetic changes.
In others it decides on solution, hallucinates parts that do not exist like adding API calls or config options that do not exists and gets the basics wrong.
Similarly you do something that's somewhat common pattern, it usually nails it. If you do something that subtly differs in certain way from a common pattern, it will just do the common pattern and you get something wrong.
I've thought about this and I think the reason is as follows: we hold code written by ourselves to a much higher standard than code written by somebody else. If you think of AI code as your own code, then it probably won't seem very acceptable because it lacks the beauty (partly subjective as all beauty tends to be) that we put into our own code. If you think of it as a coworker's code, then it's usually alright i.e. you wouldn't be wildly impressed with that coworker but it would also not be bad enough to raise a stink.
It follows from this that it also depends on how you regard the codebase that you're working on. Do you think of it as a personal masterpiece or is it some mishmash camel by committee as the codebases at work tend to be?
What people have is radically different expectations.
I noticed engineers will review Claude's output and go "holy crap that's junior-level code". Coders will just commit because looking at the code is a waste of time. Move fast, break things, disrupt, drown yourself into tech debt: the investors won't care anyways.
And no, telling the agent to "be less shit" doesn't work. I have to painstakingly point every single shit architectural decision so Claude can even see and fix it. "Git gud" didn't work for people and doesn't work for LLMs.
It's not that the code isn't DRY, it's just DRY at the wrong points of abstraction, which is even worse than not being DRY. I manage to find better patterns in each and every single task I tell Claude or Copilot to autonomously work on, dropping tons of code in the process (DRY or not). You can't prompt Claude out of making these wrong decisions (at best from very basic mistakes) since they are too granular to even extract a rule.
This is what separates a senior from a junior.
If you think Claude writes good code either you're very lucky, I'm very bad at prompting, or your standards are too low.
Don't get me wrong. I love Claude Code, but it's just a tool in my belt, not an autonomous engineer. Seeing all these "Claude wrote 97% of my code" makes me shudder at the amount of crap I will have to maintain 5 years down the line.
It's bitten me several times at work, and I rather not waste any more of my limited time doing the re-prompt -> modify code manually cycle. I'm capable of doing this myself.
It's great for the simple tasks tho, most feature work are simple tasks IMO. They were only "costly" in the sense that it took a while to previously read the code, find appropriate changes, create tests for appropriate changes, etc. LLMs reduce that cycle of work, but that type of work in general isn't the majority of my time at my job.
I've worked at feature factories before, it's hell. I can't imagine how much more hell it has become since the introduction of these tools.
Feature factories treat devs as literal assembly line machines, output is the only thing that matters not quality. Having it mass induced because of these tools is just so shitty to workers.
I fully expect a backlash in the upcoming years.
---
My only Q to the OP of this thread is what kind of teacher they are, because if you teach people anything about software while admitting that you no longer write code because it's not profitable (big LOL at caring about money over people) is just beyond pathetic.
This means it can do anything in the VM, install dependencies, etc... So far, it managed to bork the VM once (unbootable), I could have spent a bit of time figuring out what happened but I had a script to rebuild the VM so didn't bother. To be entirely fair to claude, the VM runs arch linux which is definitely easier to break than other distros.