You have any session logs or similar that shows this thing? Never once, since I started using the codex TUI when it became available, has GPT models gotten stuck on something another model breeze through, I quite literally run every prompt I do through multiple providers, this would be very visible very quickly for me.
I remember trying every -codex variant of the models and could never get them to be productive for tasks taking longer than 5-10 minutes, compared to GPT 5.5 which quite literally worked through the night day (with the /goal feature), and actually had something valuable and useful in the end this morning that wasn't exploding in LOC and complexity. I don't think any of the -codex variants would have been able to do this at all, based on how they worked when I last used them.