I have an example of interior decorating inpainting where I replaced a large floor-to-ceiling window with a mirror, and the result was pretty impressive using NB Pro from nearly a year ago.
Locally hostable? For my money I'd argue Flux.2 Klein but Qwen-Edit still puts in the work.
I do agree, however, that the Flux2 family is the SoTA at the moment. Running locally via something like Comfy gets incredible results.
If you want real precision (especially for complex polygonal masks), or if you’re concerned about image degradation over multiple edit rounds, you'll slam against the limitations of those approaches.
Even with SOTA proprietary models, repeatedly editing and re-uploading an image is like making a copy of a copy of a VHS tape: you're gonna see subtle color shifts and quality loss steadily accumulate.
At that point, you either need to put in the manual work in something like Photoshop (bringing elements in as layers and masking them properly) or, as you mentioned, use a model or workflow that properly supports masking.
So you're saying that, if I can calculate from the picture the position (height, inclination and such), and I can render the model (should be doable) for that height and angle, my best course of action could be to combine original + render and only at the end use a visual model? That could be interesting.