This is one of those areas where you might have been right.. 4-6 months ago. But if you're paying attention, the floor has moved up substantially.
For the work I do, last year the models would occasionally produce code with bugs, linter errors, etc, now the frontier models produce mostly flawless code that I don't need to review. I'll still write tests, or prompt test scenarios for it but most of the testing is functional.
If the exponential curve continues I think everyone needs to prepare for a step function change. Debian may even cease to be relevant because AI will write something better in a couple of hours.