The main difference is in the frontend skills. GPT produces terrible design. What I do these days is ask Opus to produce an HTML mockup, then feed it to Codex.
I have not had problems with long goals. I let it chomp for 40 minutes on a proof in my custom theorem prover (xhigh fast), and it got there. Very happy with Codex, I ditched Claude for it.