This task was famously incredibly difficult back when we had people producing unmaintainable mountains of millions of lines of code, to the point where shipping anything sizable in a working state on time without last minute scope reductions is nearly unheard of.
I can't imagine using AI to add another one to two zeroes to the lines of code counter would help reach the goal post.
LLMs can write a lot of code. they can even write a comprehensive test suite for that code. However they can't tell you if it doesn't work because of some interaction with something else you didn't think about. They can't tell you that all race conditions are really fixed (despite being somewhat good at tracking them down when known). They can't tell you that the program doesn't work because it doesn't do something critical that nobody thought to write into the requirements until you noticed it was missing.