If you're truly "managing fleets of agents" there's no way you're able to sift through the good and the bad in the output. If your AI-generated code is evolvable (which is hard to tell right now) then you're not writing it with "fleets of agents". If you are writing it with fleets of agents, I would bet it's not evolvable; you just haven't reached the breaking point yet.
Isn't this wrong? I thought engineered systems meant something designed with limits.