I too tried Codex and found it similarly hard to control over long contexts. It ended up coding an app that spit out millions of tiny files which were technically smaller than the original files it was supposed to optimize, except due to there being millions of them, actual hard drive usage was 18x larger. It seemed to work well until a certain point, and I suspect that point was context window overflow / compaction. Happy to provide you with the full session if it helps.
I’ll give Codex another shot with 1M. It just seemed like cperciva’s case and my own might be similar in that once the context window overflows (or refuses to fill) Codex seems to lose something essential, whereas Claude keeps it. What that thing is, I have no idea, but I’m hoping longer context will preserve it.
Feels like a losing battle, but hey, the audience is usually right.
https://apps.apple.com/us/app/clean-links-qr-code-reader/id6...
Anywhere I can toss a Tip for this free app?
Especially when it’s to the point of, you know, nagging/policing people to do it the way you’d prefer, when you could just redirect your router requests from x.com to xcancel.com
i’m not being facetious, honest question, especially considering ads are the only thing paying these people these days
Reverse engineering [1]. When decompiling a bunch of code and tracing functionality, it's really easy to fill up the context window with irrelevant noise and compaction generally causes it to lose the plot entirely and have to start almost from scratch.
(Side note, are there any OpenAI programs to get free tokens/Max to test this kind of stuff?)
Sometimes I’m exploring some topic and that exploration is not useful but only the summary.
Also, you could use the best guess and cli could tell me that this is what it wants to compact and I can tweak its suggestion in natural language.
Context is going to be super important because it is the primary constraint. It would be nice to have serious granular support.
Like, I'd love an optional pre-compaction step, "I need to compact, here is a high level list of my context + size, what should I junk?" Or similar.
However, in some harnesses the model is given access to the old chat log/"memories", so you'd need a way to provide that. You could compromise by running /compact and pasting the output from your own summarizer (that you ran first, obviously).
The agent could pre-select what it thinks is worth keeping, but you’d still have full control to override it. Each chunk could have three states: drop it, keep a summarized version, or keep the full history.
That way you stay in control of both the context budget and the level of detail the agent operates with.
But I'm mostly working on personal projects so my time is cheap.
I might experiment with having the file sections post-processed through a token counter though, that's a great idea.
I'm now more careful, using tracking files to try to keep it aligned, but more control over compaction regardless would be highly welcomed. You don't ALWAYS need that level of control, but when you do, you do.
For me, I would say speed (not just time to first token, but a complete generation) is more important then going for a larger context size.
I've also had it succeed in attempts to identify some non-trivial bugs that spanned multiple modules.