undefined

points

[-]

> With Codex (5.3), the framing is an interactive collaborator: you steer it mid-execution, stay in the loop, course-correct as it works.

> With Opus 4.6, the emphasis is the opposite: a more autonomous, agentic, thoughtful system that plans deeply, runs longer, and asks less of the human.

Ain't the UX is the exact opposite? Codex thinks much longer before gives you back the answer.

by xd19364 hours ago|

parent|

[-]

I've also had the exact opposite experience with tone. Claude Code wants to build with me, and Codex wants to go off on its own for a while before returning with opinions.

by mrkstu4 hours ago|

parent|

[-]

Its likely that both are steering towards the middle from their current relative extremes and converging to nearly the same place.

by gervwyk3 hours ago|

parent|

[-]

also my experience in using these two models. they are trying to recover from oversteer perhaps.

by WilcoKruijer3 hours ago|

parent|

prev|

[-]

Yes, you’re right for 4.5 and 5.2. Hence they’re focusing on improving the opposite thing and thus are actually converging.

by bt1a2 hours ago|

parent|

prev|

[-]

This is most likely an inference serving problem in terms of capacity and latency given that Opus X and the latest GPT models available in the API have always responded quickly and slowly, respectively

by cwyers2 hours ago|

parent|

prev|

[-]

Codex now lets you tell the LLM tgings in the middle of its thinking without interrupting it, so you can read the thinking traces and tell it to change course if it's going off track.

by fluidcruft1 hours ago|

parent|

[-]

That just seems like a UI difference. I've always interrupted claude code added a comment and it's continued without much issue. Otherwise if you just type the message is queued for next. There's no real reason to prefer one over the other except it sounds like codex can't queue messages?

by ghosty1414 hours ago|

prev|

[-]

I'm personally 100% convinced (assuming prices stay reasonable) that the Codex approach is here to stay.

Having a human in the loop eliminates all the problems that LLMs have and continously reviewing small'ish chunks of code works really well from my experience.

It saves so much time having Codex do all the plumbing so you can focus on the actual "core" part of a feature.

LLMs still (and I doubt that changes) can't think and generalize. If I tell Codex to implement 3 features he won't stop and find a general solution that unifies them unless explicitly told to. This makes it kinda pointless for the "full autonomy" approach since effecitly code quality and abstractions completely go down the drain over time. That's fine if it's just prototyping or "throwaway" scripts but for bigger codebases where longevity matters it's a dealbreaker.

by _zoltan_3 hours ago|

parent|

[-]

I'm personally 100% convinced of the opposite, that it's a waste of time to steer them. we know now that agentic loops can converge given the proper framing and self-reflectiveness tools.

by sealeck3 hours ago|

parent|

[-]

Converge towards what though... I think the level of testing/verification you need to have an LLM output a non-trivial feature (e.g. Paxos/anything with concurrency, business logic that isn't just "fetch value from spreadsheet, add to another number and save to the database") is pretty high.

by replygirl1 hours ago|

parent|

[-]

in the new world, engineers have to actually be good at capturing and interpreting requirements

by zeroxfe1 hours ago|

parent|

prev|

[-]

> it's a waste of time to steer them

It's not a waste of time, it's a responsibility. All things need steering, even humans -- there's only so much precision that can be extrapolated from prompts, and as the tasks get bigger, small deviations can turn into very large mistakes.

There's a balance to strike between micro-management and no steering at all.

by bcarv1 hours ago|

parent|

prev|

[-]

Does the AI agent know what your company is doing right now, what every coworker is working on, how they are doing it, and how your boss will change priorities next month without being told?

If it really knows better, then fire everyone and let the agent take charge. lol

by IMTDb6 minutes ago|

parent|

[-]

A significant portion of engineering time is now spent ensuring that yes, the LLM does know about all of that. This context can be surfaced through skills, MCP, connectors, RAG over your tools, etc. Companies are also starting to reshape their entire processes to ensure this information can be properly and accurately surfaced. Most are still far from completing that transformation, but progress tends to happen slowly, then all at once.

by hyldmo1 hours ago|

parent|

prev|

[-]

No, but Codex wouldn’t have asked you those questions either

by bcarv1 hours ago|

parent|

[-]

For me, it still asks for confirmation at every decision when using plans. And when multiple unforeseen options appear, it asks again. I don’t think you’ve used Codex in a while.

by NuclearPM31 minutes ago|

parent|

prev|

[-]

> If I tell Codex to implement 3 features he won't stop and find a general solution that unifies them unless explicitly told to

That could easily be automated.

by utilize18084 hours ago|

prev|

[-]

I think it's the opposite. Especially considering Codex started out as a web app that offers very little interactivity: you are supposed to drop a request and let it run automatously in a containerized environment; you can then follow up on it via chat --- no interactive code editing.

by Rperry21744 hours ago|

parent|

[-]

Fair I agree that was true of early codex and my perception too.. but today there are two announcements that came out and thats what im referring to.

specifically, the GPT-5.3 post explicitly leans into "interactive collaborator" langauge and steering mid execution

OpenAI post: "Much like a colleague, you can steer and interact with GPT-5.3-Codex while it’s working, without losing context."

OpenAI post: "Instead of waiting for a final output, you can interact in real time—ask questions, discuss approaches, and steer toward the solution"

Claude post: "Claude Opus 4.6 is designed for longer-running, agentic work — planning complex tasks more carefully and executing them with less back-and-forth from the user."

by fluidcruft1 hours ago|

parent|

[-]

Frankly it seems to be that codex is playing catch-up with claude code and claude code is just continuing to move further ahead. The thing with claude code is it will work longer... if you want it to. It's always had good oversight and (at least for me) it builds trust slowly until you are wishing it would do more at once. When I've used codex (it has been getting better) but back in the day it would just do things and say it's done and you're just sitting there wondering "wtf are you doing?". Claude code is more the opposite where you can watch as closely as you want and often you get to a point where you have enough trust and experience with it that you know what it's going to do and don't want to bother.

by bob10292 hours ago|

prev|

[-]

I think there is another philosophy where the agent is domain specific. Not that we have to invent an entirely new universe for every product or business, but that there is a small amount of semi-customization involved to achieve an ideal agent.

I would much rather work with things like the Chat Completion API than any frameworks that compose over it. I want total control over how tool calling and error handling works. I've got concerns specific to my business/product/customer that couldn't possibly have been considered as part of these frameworks.

Whether or not a human needs to be tightly looped in could vary wildly depending on the specific part of the business you are dealing with. Having a purpose-built agent that understands where additional verification needs to occur (and not occur) can give you the best of both worlds.

by jhancock2 hours ago|

prev|

[-]

Good breakdown.

I usually want the codex approach for code/product "shaping" iteratively with the ai.

Once things are shaped and common "scaling patterns" are well established, then for things like adding a front end (which is constantly changing, more views) then letting the autonomous approach run wild can *sometimes* be useful.

I have found that codex is better at remembering when I ask to not get carried away...whereas claude requires constant reminders.

by mcintyre19944 hours ago|

prev|

[-]

This kind of sounds like both of them stepping into the other’s turf, to simplify a bit.

I haven’t used Codex but use Claude Code, and the way people (before today) described Codex to me was like how you’re describing Opus 4.6

So it sounds like they’re converging toward “both these approaches are useful at different times” potentially? And neither want people who prefer one way of working to be locked to the other’s model.

by giancarlostoro4 hours ago|

prev|

[-]

> With Opus 4.6, the emphasis is the opposite: a more autonomous, agentic, thoughtful system that plans deeply, runs longer, and asks less of the human.

This feels wrong, I can't comment on Codex, but Claude will prompt you and ask you before changing files, even when I run it in dangerous mode on Zed, I can still review all the diffs and undo them, or you know, tell it what to change. If you're worried about it making too many decisions, you can pre-prompt Claude Code (via .claude/instructions.md) and instruct it to always ask follow up questions regarding architectural decisions.

Sometimes I go out of my way to tell Claude DO NOT ASK ME FOR FOLLOW UPS JUST DO THE THING.

by Rperry21744 hours ago|

parent|

[-]

yeah I'm mostly just talking about how they're framing it: "Claude Opus 4.6 is designed for longer-running, agentic work — planning complex tasks more carefully and executing them with less back-and-forth from the user"

I guess its also quite interesting that how they are framing these projects are opposite from how people currently perceive them and I guess that may be a conscious choice...

by giancarlostoro4 hours ago|

parent|

[-]

I get what you mean now, I like that to be fair, sometimes I want Claude to tell me some architectural options, so I ask it so I can think about what my options are, sometimes I rethink my problem if I like Claudes conclusion.

by techbro_1a3 hours ago|

prev|

[-]

> With Codex (5.3), the framing is an interactive collaborator: you steer it mid-execution, stay in the loop, course-correct as it works.

This is true, but I find that Codex thinks more than Opus. That's why 5.2 Codex was more reliable than Opus 4.5

by hbarka2 hours ago|

prev|

[-]

How can they be diverging, LLMs are built on similar foundations aka the Transformer architecture. Do you mean the training method (RLHF) is diverging?

by iranintoavan2 hours ago|

parent|

[-]

I'm not OP but I suspect they are meaning the products / tooling / company direction, not necessarily the underlying LLM architecture.

by cchance4 hours ago|

prev|

[-]

Just because you can inject steering doesn't mean they stered away from long running...

Theres hundreds of people who upload Codex 5.2 running for hours unattended and coming back with full commits

by blurbleblurble2 hours ago|

prev|

[-]

Funny cause the situation was totally flipped last iteration.

by rozumbrada3 hours ago|

prev|

[-]

I read this exact comment with I would say completely the same words several times in X and I would bet my money it's LLM generated by someone who has not even tried both the tools. This AI slop even in the site like this without direct monetisation implications from fake engagement is making me sick...

by pyrolistical2 hours ago|

prev|

[-]

Boing vs airbus philosophy

by d--b4 hours ago|

prev|

[-]

I am definitely using Opus as an interactive collaborator that I steer mid-execution, stay in the loop and course correct as it works.

I mean Opus asks a lot if he should run things, and each time you can tell it to change. And if that's not enough you can always press esc to interrupt.

by adarsh23212 hours ago|

prev|

[-]

[dead]