undefined

points

[-]

You should try GPT, I’d be really interested to hear if it works better. (Exclusively using GPT for systems work at $DAYJOB, but compare with opus every couple weeks and GPT consistently gives me better results)

by X-Istence15 hours ago|

parent|

[-]

I've been comparing Claude vs Codex using GPT and Claude consistently is better than GPT about reasoning, about writing code, and using the tools as appropriate.

GPT for instance had a lot of issues using git worktrees, and didn't understand how to correctly use it to then merge stuff back into a main branch, vs Claude which seems to do this much more naturally.

GPT also left me with broken tests/code that I had to iterate on manually, Claude is much better about reasoning through code. Primarily Python.

by _flux5 hours ago|

parent|

[-]

> GPT for instance had a lot of issues using git worktrees, and didn't understand how to correctly use it to then merge stuff back into a main branch, vs Claude which seems to do this much more naturally.

I wonder how much of that is due to the model being somehow better, or the harness having built-in instructions on how to use them.

I've used worktrees with Codex just fine, but I instructed it to use my scripts for setting it up and tearing it down. The scripts also reflinked existing compilation artifacts to speed up compiling and allocated a fresh db instance for it, but then also applied a simple protocol for locking the master repository during merges, so multiple agents wouldn't try to merge at the same time. It has been following those instructions quite well.

by 15 hours ago|

parent|

prev|

[-]

deleted

by mohsen116 hours ago|

parent|

prev|

[-]

OpenAI gave me that 10x boost and used it all already for this week. I'm guessing the last 50 tests is only doable by GPT 5.5 xhigh

by odie553316 hours ago|

prev|

[-]

Do you have any write ups on your workflow with Claude and github dev?

by mebcitto16 hours ago|

prev|

[-]

That might be opus 4.7 behaviour because I also get that all the time in the past few weeks. Also complex code base, but likely an order of magnitude simpler than yours.

by 16 hours ago|

prev|

[-]

deleted