Not sure why good coverage wouldn't mitigate risk in a refactor...
My mantra whenever I'm working with AI is that I want it to know what "point b" looks like and be able to tell by itself whether it's gotten there...
If you have a working implementation, it sounds like you have a basis for automated tests to be written... once you have that (assuming that the tests are written to test the interface rather than the implementation), then it should be fairly direct to have an agent extract and decompose...