They are large language models. Not automated development machines. They hallucinate.
The goal post has not shifted since 2023 or so. Make an LLM that doesn't blatantly disregard knowledge it has, instructions it has been giving, over and over, and you win. If trillions of USD of investment can't do it, I'd be curious to see what can.
If the AI is not good enough, then don't fire the devs. If/when the devs are no longer needed, I don't see why the need would return later, that was my point.