Just curious as I'm trying to branch out from using Claude for everything, and I've been following a somewhat similar workflow to yours, except just having Claude review and re-review its plan (sometimes using different roles, e.g. system architect vs SWE vs QA eng) and it will similarly identify issues that it missed originally.
But now I'm curious to try this while weaving in more GPT.