That issue, and the issue of "aesthetics", are the biggest complaints I have today. I don't know exactly how to define aesthetics, but it's when AI is making decisions that no experienced developer or designer would. They may be functionally correct but "ugly" to another developer or and end user.
An example is an case I ran in to yesterday where parsing a config, and failing and logging on a configuration error. It logged a specific item where the config was invalid but not what group or any notion of where in the config this error was. Of course, specific item names could be duplicated in different parts of the config. It's small, but correcting these minor things take time and they are the types of decisions no one would have made who had any experience writing code and debugging a config problem. This was Opus 4.8/max too.
* SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios https://arxiv.org/abs/2512.18470 * SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration https://arxiv.org/abs/2603.03823