Idk if it’s because I set codex to xhigh reasoning, but even then it still seems way higher than Claude. The input/output ratio feels large too, eg I have codex session which says ~500M in / ~2M out.
It used to give me precise answers, "surgical" is how I described it to my friends. Now it generates a lot of slop and plenty of "follow ups". It doesn't give me wrong answers, which is ok, but I've found that things that used to take 3-4 prompts now take 8-10. Obviously my prompting skills haven't changed much and, if anything, they've become better.
This is something that other colleagues have observed as well. Even the same GPT5.4 model feels different and more chatty recently. Btw, I think their version numbers mean nothing, no one can be certain about the model that is actually running on the backend and it is pretty evident that they're continuously "improving" it.
Just that they took down some "io" mentions because of some trademark dispute with a third party "iyo".