upvote
I've been comparing Claude vs Codex using GPT and Claude consistently is better than GPT about reasoning, about writing code, and using the tools as appropriate.

GPT for instance had a lot of issues using git worktrees, and didn't understand how to correctly use it to then merge stuff back into a main branch, vs Claude which seems to do this much more naturally.

GPT also left me with broken tests/code that I had to iterate on manually, Claude is much better about reasoning through code. Primarily Python.

reply
> GPT for instance had a lot of issues using git worktrees, and didn't understand how to correctly use it to then merge stuff back into a main branch, vs Claude which seems to do this much more naturally.

I wonder how much of that is due to the model being somehow better, or the harness having built-in instructions on how to use them.

I've used worktrees with Codex just fine, but I instructed it to use my scripts for setting it up and tearing it down. The scripts also reflinked existing compilation artifacts to speed up compiling and allocated a fresh db instance for it, but then also applied a simple protocol for locking the master repository during merges, so multiple agents wouldn't try to merge at the same time. It has been following those instructions quite well.

reply
deleted
reply
OpenAI gave me that 10x boost and used it all already for this week. I'm guessing the last 50 tests is only doable by GPT 5.5 xhigh
reply