upvote
Yes indeed. More than 1 million lines of code (including tests) is jumping lots of hoops but with LLMs it's not as painful so you can just ask it to do the hard things.

Example of a Claude Code session after 2 hours of "Crunching" that came out without results https://github.com/mohsen1/tsz/pull/4868 (Edit I force pushed to PR to solve the problem, you can see the initial refuse message in the initial version of PR description)

Funny thing is, the last percent of the test have been so hard to work on that Opus 4.7 routinely bails and says "it's too involved or complicated" so I had to add prompts specifically asking it not to bail.

reply
You should try GPT, I’d be really interested to hear if it works better. (Exclusively using GPT for systems work at $DAYJOB, but compare with opus every couple weeks and GPT consistently gives me better results)
reply
I've been comparing Claude vs Codex using GPT and Claude consistently is better than GPT about reasoning, about writing code, and using the tools as appropriate.

GPT for instance had a lot of issues using git worktrees, and didn't understand how to correctly use it to then merge stuff back into a main branch, vs Claude which seems to do this much more naturally.

GPT also left me with broken tests/code that I had to iterate on manually, Claude is much better about reasoning through code. Primarily Python.

reply
> GPT for instance had a lot of issues using git worktrees, and didn't understand how to correctly use it to then merge stuff back into a main branch, vs Claude which seems to do this much more naturally.

I wonder how much of that is due to the model being somehow better, or the harness having built-in instructions on how to use them.

I've used worktrees with Codex just fine, but I instructed it to use my scripts for setting it up and tearing it down. The scripts also reflinked existing compilation artifacts to speed up compiling and allocated a fresh db instance for it, but then also applied a simple protocol for locking the master repository during merges, so multiple agents wouldn't try to merge at the same time. It has been following those instructions quite well.

reply
deleted
reply
OpenAI gave me that 10x boost and used it all already for this week. I'm guessing the last 50 tests is only doable by GPT 5.5 xhigh
reply
Do you have any write ups on your workflow with Claude and github dev?
reply
That might be opus 4.7 behaviour because I also get that all the time in the past few weeks. Also complex code base, but likely an order of magnitude simpler than yours.
reply
deleted
reply
They mentioned that they wanted to port their compiler over to retain existing behavior (vs a re-write) and Rust has a hard time with their cyclic data structures.
reply
Is GC useful for a static type checker? Or did they make a new runtime?
reply
The point is that having a GC will affect your data structure and algorithm design, so it’s easier to automatically transform JS or TS to Go than to rust because you’re mostly reducing things down to one problem (translation) rather than multiple intertwined problems.
reply
tyscript compiler is a cli tool. and is run for short periods of time. GC collection and memory leaks should be least of issue to look for
reply