The important role for me, as a SWE, in the process, is verify that the code does what we actually want it to do. If you remove yourself from the process by letting it run on its own overnight, how does it know it's doing what you actually want it to do?
Or is it more like with your usecase—you can say "here's a failing test—do whatever you can to fix it and don't stop until you do". I could see that limited case working.
I don't even necessarily ask it to fix the bug — just identify the bug
Like if I've made a change that is causing some unit test to fail, it can just run off and figure out where I made an off-by-one error or whatever in my change.
Bad idea. It can modify the code that the test passes but everything else is now broken.
https://github.com/snarktank/ralph
Its constantly restarting itself, looking at the current state of things, re-reading what was the request, what it did and failed at in the past (at a higher level), and trying again and again.
This is impressive, you’ve completely mitigated the risk of learning or understanding.
I don't discount the value of blood, sweat and tears spent on debugging those hard issues, and the lessons learned from doing so, but there is a certain point where it's OK to take a pass and just let the robots figure it out.