upvote
Same. I stopped my Pro subscription yesterday after entering the week with 70% of my tokens used by Monday morning (on light, small weekend projects, things I had worked on in the past and barely noticed a dent in usage.) Support was... unhelpful.

It's been funny watching my own attitude to Anthropic change, from being an enthusiastic Claude user to pure frustration. But even that wasn't the trigger to leave, it was the attitude Support showed. I figure, if you mess up as badly as Anthropic has, you should at least show some effort towards your customers. Instead I just got a mass of standardised replies, even after the thread replied I'd be escalated to a human. Nothing can sour you on a company more. I'm forgiving to bugs, we've all been there, but really annoyed by indifference and unhelpful form replies with corporate uselessness.

So if 4.7 is here? I'd prefer they forget models and revert the harness to its January state. Even then, I've already moved to Codex as of a few days ago, and I won't be maintaining two subscriptions, it's a move. It has its own issues, it's clear, but I'm getting work done. That's more than I can say for Claude.

reply
> It's been funny watching my own attitude to Anthropic change, from being an enthusiastic Claude user to pure frustration.

You were enthusiastic because it was a great product at an unsustainable price.

Its clear that Claude is now harnessing their model because giving access to their full model is too expensive for the $20/m that consumers have settled on as the price point they want to pay.

I wrote a more in depth analysis here, there's probably too much to meaningfully summarize in a comment: https://sustainableviews.substack.com/p/the-era-of-models-is...

reply
Off topic, but I really like the writing style on your blog. Do you have any advice for improving my own? In an older comment[1], you mentioned the craft of sharpening an idea to a very fine, meaningful, well-written point. Are there any books, or resources you’d recommend for honing that craft? Thanks in advance.

[1] https://news.ycombinator.com/item?id=44082994

reply
The thing that inspires my writing is that the best sentences are self evident. Meaning you declare it without evidence and it feels so intuitively right to most people. It resonates, either being their lived experience, or being the inevitable conclusion of a line of thinking.

Making a sentence like requires deeply understanding a problem space to the point where these sentences emerge, rather than any "craft" of writing.

So the craft is thinking through a topic, usually by writing about it, and then deleting everything you've written because you arrived at the self evident position, and then writing from the vantage point of that self evident statement.

I feel that writing is a personal craft and you must dig it out of yourself through the practice of it, rather than learn it from others. The usage of AI as a resource makes this much clearer to me. You must be confident in your own writing not because it is following best practices or techniques of others but because it is the best version of your own voice at the time of being written.

reply
Curious why you think that? Stuff like

> Yes, there is a relative scale level...

> Yes, having the smartest model will...

> yes Chinese AI companies have ...

yes yes yes, I didn't say anything, why write in a way that insinuates that I was thinking that?

I mean it doesn't come off as AI slop, so that's yay in 2026. But why do you think it is so good?

reply
haha it is poorly written, its one of my pieces with the fewest drafts, i just wrote it and clicked submit to get the thoughts out of my head.

I think he is referring to the art of refining an idea though, which I do have something to say on his comment.

reply
I agree with what you what you have written, which is why I would never pay a subscription to an external AI provider.

I prefer to run inference on my own HW, with a harness that I control, so I can choose myself what compromise between speed and the quality of the results is appropriate for my needs.

When I have complete control, resulting in predictable performance, I can work more efficiently, even with slower HW and with somewhat inferior models, than when I am at the mercy of an external provider.

reply
What’s your setup?
reply
For now, the most suitable computer that I have for running LLMs is an Epyc server with 128 GB DRAM and 2 AMD GPUs with 16 GB of HBM memory each.

I have a few other computers with 64 GB DRAM each and with NVIDIA, Intel or AMD GPUs. Fortunately all that memory has been bought long ago, because today I could not afford to buy extra memory.

However, a very short time ago, i.e. the previous week, I have started to work at modifying llama.cpp to allow an optimized execution with weights stored in SSDs, e.g. by using a couple of PCIe 5.0 SSDs, in order to be able to use bigger models than those that can fit inside 128 GB, which is the limit to what I have tested until now.

By coincidence, this week there have been a few threads on HN that have reported similar work for running locally big models with weights stored in SSDs, so I believe that this will become more common in the near future.

The speeds previously achieved for running from SSDs hover around values from a token at a few seconds to a few tokens per second. While such speeds would be low for a chat application, they can be adequate for a coding assistant, if the improved code that is generated compensates the lower speed.

reply
Thank you for that, it's very interesting. I keep wanting to find time to try out a local only setup with an NVIDIA 4090 and 64gb of RAM. It seems like it may be time try it out.
reply
At my job and for personal projects I pay per token with claude and I've had no problems at all with it. No slowdowns, no "throttling", nothing.

I'm honestly surprised how many people have subscriptions and are expecting anthropic to eat the cost lol

reply
My bad — I had Max, so more than $20. I can’t edit the comment any more. Can’t keep track of the names. I wonder when ‘pro’ started to mean ‘lowest tier’.

But your article is interesting. You think some of the degradation is because when I think I’m using Opus they’re giving me Sonnet invisibily?

reply
Hard to say, but the fact is the intelligence was there and now it's not.

Maybe they are giving Sonnet, or maybe a distilled Opus, or maybe Opus but with lower context, not quite sure but intelligence costs compute so less intelligence means cheaper compute.

reply
I used the $60/mo subscription and I bet most developers get access to AI agents via their company, and there was no difference. They should have reduced the rate limits, or offered a new model, anything except silently reduce the quality of their flagship product to reduce cost.

The cost of switching is too low for them to be able to get away with the standard enshittification playbook. It takes all of 5 minutes to get a Codex subscription and it works almost exactly the same, down to using the same commands for most actions.

reply
Thank goodness for capitalism for providing multiple competitors to multibillion dollar companies
reply
So instead of breaking shit they should have just increased their prices.
reply
I've given up on Claude after seeing the response quality degrade so much over the past two weeks, and now this? I've unsubscribed. I don't know why people are still giving this company money.
reply
deleted
reply
It seems like the big companies they're providing Mythos to are their only concern right now.
reply
Corporate software in general is often chosen based on the value returned simply being "good enough" most of the time, because the actual product being purchased is good controls for security, compliance, etc.

A corporate purchaser is buying hundreds to thousands of Claude seats and doesn't care very much about percieved fluctuations in the model performance from release to release, they're invested in ties into their SSO and SIEM and every other internal system and have trained their employees and there's substantial cost to switching even in a rapidly moving industry.

Consumer end-users are much less loyal, by comparison.

reply
Same here, working with claude code has been unproductive since March; everyone on my team has complained about the decline in claude code quality, which is why we’re switching to Codex.
reply
I havent been using my claude sub lately but I liked 4.6 three weeks ago. Did something change?
reply
2 weeks ago the rolling session usage plummeted to borderline unusable. I'd say I get a weekly output equivalent to 2 session windows before change.
reply
I didn't experience that at all. I know there are lots of rumblings around here about that, but I'm posting this to show this wasn't a universal experience.
reply
https://marginlab.ai/trackers/claude-code/

Seems like there is evidence for that.

reply
Even just in chats with Opus 4.6 I noticed hitting limits so much faster.
reply
Its funny watching llm users act like gamblers. Every other week swearing by one model and cursing another, like a gambler who thinks a certain slot machine, or table is cold this week. These llm companies are literally building slot machine mechanics into their ui interfaces too, I don't think this phenomenon is a coincidence.

Stop using these dopamine brain poisoning machines, think for yourself, don't pay a billionaire for their thinking machine.

reply
Don't confuse the many voices of a crowd with a single person's fickle view. If you can track an individual person or organization who changes their mind 'every other week' then more power to you, but unless you're performing that longitudinal study you are simply seeing differential levels of enthusiasm.
reply
I get what you mean but they're all over twitter, its not random levels of enthusiasm, follow a few heavy llm users who tweet a lot and you'll see what I mean.
reply
> Stop using these dopamine brain poisoning machines, think for yourself, don't pay a billionaire for their thinking machine.

Yeah, and also stop using these things they call "computers", think for yourself, write your texts by hand, send letters to people. /s

reply
When did I say to stop using computers? You don't prefer to think for yourself? You're cooked.
reply
[dead]
reply
Funny because many people here were so confident that OpenAI is going to collapse because of how much compute they pre-ordered.

But now it seems like it's a major strategic advantage. They're 2x'ing usage limits on Codex plans to steal CC customers and it seems to be working. I'm seeing a lot of goodwill for Codex and a ton of bad PR for CC.

It seems like 90% of Claude's recent problems are strictly lack of compute related.

reply
> people here were so confident that OpenAI is going to collapse because of how much compute they pre-ordered

That's not why. It was and is because they've been incredibly unfocused and have burnt through cash on ill-advised, expensive things like Sora. By comparison Anthropic have been very focused.

reply
I don't think that was the main reason for people thinking OpenAI is going to collapse here.

By far, the biggest argument was that OpenAI bet too much on compute.

Being unfocused is generally an easy fix. Just cut things that don't matter as much, which they seem to be doing.

reply
Nobody was talking about them betting too much on compute, people were saying that their shady deals on compute with NVIDIA and Oracle were creating a giant bubble in their attempt to get a Too Big To Fail judgement (in their words- taxpayer-backed "backstop").
reply
It really wasn't. Most of the argument was around product portfolio and agentic coding performance.
reply
That’s just short term talk. The main thesis behind their collapse is that they won’t be able to pay their compute bills because they won’t have enough demand to.
reply
That doesn't really track because their compute isn't like a debt obligation.

The compute topic was more around how OpenAI, Nvidia, Oracle, and others were all announcing commitments to spend money in each other in a circular way which could just net out to zero value.

reply
Honestly it seems like each major player here fumbles the ball in turn, quite fun to observe. But hey, it's a difficult game.
reply
To me it seems like they burn so much money they can do lots of things in parallel. My guess would be that e.g. codex and sora are very independently developed. After all there's a quite a hard limit on how many bodies are beneficial to a software project.
reply
They all compete internally over constrained compute resources - for R&D and production.
reply
Personally its down to Altman having the cognitive capacity of a sleeping snail, the world insight of a hormonal 14 year old who's only ever read one series of manga.

Despite having literal experts at his fingertips, he still isn't able to grasp that he's talking unfilters bollocks most of the time. Not to mention is Jason level of "oath breaking"/dishonesty.

reply
> By comparison Anthropic have been very focused.

Ah yes, very focused on crapping out every possible thing they can copy and half bake?

reply
> I'm seeing a lot of goodwill for Codex and a ton of bad PR for CC.

AI is one of the things that you cannot find genuine opinions online. Just like politics. If you visit, say, r/codex, you'll see all the people complaining about how their limits are consumed by "just N prompts" (N is a ridiculously small integer).

It's all astroturfed from all sides.

reply
I agree. And I am seeing it in a lot of venues, especially political discourse. Commenting is increasingly AI driven I fear the whole thing is going to collapse and nobody will be able to rely on online commentary to make decisions. At least not without a lot of independent research, maybe that’s for the best, but it’s definitely going to change the Internet.
reply
Seems very short term. Like how cheap Uber was initially. Like Claude was before!

Eventually OpenAI will need to stop burning money.

reply
OpenAI will need to stop burning money eventually, but so does everyone else in the space. The longer they can do this the more squeeze it puts on their competitors.

I would call out though that I think there is one way in which this differs from the Uber situation. Theoretically at some point we should hit a place where compute costs start to come down either because we've built enough resources or because most tasks don't need the newest models and a lot of the work people are doing can be automatically sent to cheaper models that are good enough. Unless Uber's self driving program magically pops back up, Uber doesn't really have that since their biggest expense is driver wages.

I think it's a long shot, but not impossible, that if OpenAI can subsidize costs long enough that prices don't need to go too much higher to be sustainable.

reply
My standing assumption is the darling company/model will change every quarter for the foreseeable future, and everyone will be equally convinced that the hotness of the week will win the entire future.

As buyers, we all benefit from a very competitive market.

reply
This is the primary reason I won’t sign up for an annual plan.
reply
In hindsight, it is painfully clear that Antropic’s conservative investment strategy has them struggling with keeping up with demand and caused their profit margin to shrink significantly as last buyer of compute.
reply
they've also introduced a lot of caching and token burn related bugs which makes things worse. any bug that multiplies the token burn also multiplies their infrastructure problems.
reply
Funny because the general consensus is that everyone is burning money so fast that they would not be able to get it back from their AI business in the near future. OpenAI is simply the one with the most aggressive expenditure. Google has its own cash cows. Anthropic has been conservative all around.
reply
Is that 2x still going on I thought that ended in early April
reply
Different plan. The old 2x has been discontinued, and the bonus is now (temporarily) available for the new $100 plan users in an effort, presumably, to entice them away from Anthropic.
reply
For the $200 users, it never ended.
reply
It’s for Pro users only, I think the 2x is up to May 31.
reply
They did it again to "celebrate" the release of the $100 plan.
reply
On plus?
reply
That’s more a leadership decision because Anthropic are nerfing the model to cut costs, if they stop doing that then they’ll stay ahead.
reply
Proof they are nerfing the model? It is stable in benchmarks: https://marginlab.ai/trackers/claude-code-historical-perform...

All this just reads like just another case of mass psychosis to me

reply
Proof they don't nerf it only after testing that the benchmarks there stay the same? So overall performance degrades but they isolate those benchmarks?
reply
You are dramatically overestimating how much time people have to waste at these smaller hypergrowth companies
reply
Their top tier plan got a 3x limit boost. This has been the first week ever where I haven't run out of tokens.
reply
The market here is extraordinarily vibes-based and burning billions of dollars for a ephemeral PR boost, which might only last another couple weeks until people find a reason to hate Codex, does not reflect well on OAI's long term viability.
reply
> It seems like 90% of Claude's recent problems are strictly lack of compute related.

Downtime is annoying, but the problem is that over the past 2-3 weeks Claude has been outrageously stupid when it does work. I have always been skeptical of everything produced - but now I have no faith whatsoever in anything that it produces. I'm not even sure if I will experiment with 4.7, unless there are glowing reviews.

Codex has had none of these problems. I still don't trust anything it produces, but it's not like everything it produces is completely and utterly useless.

reply
So many people confuse sycophantic behavior with producing results.
reply
I have both Claude and OpenAI, side by side. I would say sonnet 46 still beats gpt 54 for coding (at least in my use case) But after about 45 minutes I'm out of my window, so I use openai for the next 4 hours and I can't even reach my limit.
reply
Most of the compute OpenAI "preordered" is vapour. And it has nothing to do with why people thought the company -- which is still in extremely rocky rapids -- was headed to bankruptcy.

Anthropic has been very disciplined and focused (overwhelmingly on coding, fwiw), while OpenAI has been bleeding money trying to be the everything AI company with no real specialty as everyone else beat them in random domains. If I had to qualify OpenAI's primary focus, it has been glazing users and making a generation of malignant narcissists.

But yes, Anthropic has been growing by leaps and bounds and has capacity issues. That's a very healthy position to be in, despite the fact that it yields the inevitable foot-stomping "I'm moving to competitor!" posts constantly.

reply
How is droves of your customers leaving, whether they're foot stomping or not, healthy?
reply
Droves? I mean, if we take the "I'm leaving!" posts seriously, the company has people so emotionally invested they feel the need to announce their departure is a pretty good place to be. Some tiny sampling of unhappy customers is indicative of nothing.

Honestly at this point I am pretty firmly of the belief that OAI is paying astroturfers to post the "Boy does anyone else think Claude is dumb now and Codex is better?" (always some unreproducible "feel" kind of thing that are to be adopted at face value despite overwhelming evidence that we shouldn't). OAI is kind of in the desperation stage -- see the bizarre acquisitions they've been making, including paying $100M for some fringe podcast almost no one had heard of -- and it would not be remotely unexpected.

reply
We have no idea the ratio of foot stompers to quite quitters but I'm sure most people don't announce it. I cancelled my subscription and hadn't told anybody. And I quit based on personal experience over the last few weeks, not on social media pr.
reply
All of the smart people I know went to work at OpenAI and none at Anthropic. In addition to financial capital, OpenAI has a massive advantage in human capital over Anthropic.

As long as OpenAI can sustain compute and paying SWE $1million/year they will end up with the better product.

reply
Attracting talent with huge sums of money just gets you people who optimize for money, and it's usually never a good long-term decision. I think it's what led to Google's downturn.
reply
Google is doing great still. One of the few FAANG I am bullish on over the long timescale.
reply
> I think it's what led to Google's downturn.

What downturn is that exactly?

reply
> OpenAI has a massive advantage in human capital over Anthropic.

but if your leader is a dipshit, then its a waste.

Look You can't just throw money at the problem, you need people who are able to make the right decisions are the right time. That that requires leadership. Part of the reason why facebook fucked up VR/AR is that they have a leader who only cares about features/metrics, not user experience.

Part of the reason why twitter always lost money is because they had loads of teams all running in different directions, because Dorsey is utterly incapable of making a firm decision.

Its not money and talent, its execution.

reply
Are those "smart people you know" machine learning researchers?
reply
No, infrastructure engineers. The one who scale the system up so you don’t have to rate limit.
reply
I switched to Codex and found it extremely inferior for my use case.

It is much faster, but faster worse code is a step in the wrong direction. You're just rapidly accumulating bugs and tech debt, rather than more slowly moving in the correct direction.

I'm a big fan of Gemini in general, but at least in my experience Gemini Cli is VERY FAR behind either Codex or CC. It's both slower than CC, MUCH slower than Codex, and the output quality considerably worse than CC (probably worse than Codex and orders of magnitude slower).

In my experience, Codex is extraordinarily sycophantic in coding, which is a trait that could t be more harmful. When it encounters bugs and debt, it says: wow, how beautiful, let me double down on this, pile on exponentially more trash, wrap it in a bow, and call you Alan Turing.

It also does not follow directions. When you tell it how to do something, it will say, nah, I have a better faster way, I'll just ignore the user and do my thing instead. CC will stop and ask for feedback much more often.

YMMV.

reply
What is your use case? I read comments like this and it's totally opposite of my experience, I have both CC Opus 4.6 and Codex 5.4 and Codex is much more thorough and checks before it starts making changes maybe even to a fault but I accept it because getting Opus to redo work because it messes up and jumps in the first attempt is a massive waste of time, all tasks and spec are atomic and granularly spec'd, I'd say 30% of the time I regret when I decide to use Opus for 'simpler' and work
reply
I'm building a correct, safe, highly understandable, concurrent runtime & language.

Essentially Rust/Tokio if it was substantially easier than even Go - and without a need for crates and a subset of the language to achieve near Ada-level safety.

The codebase is ~100k lines of code.

reply
>> I switched to Codex and found it extremely inferior for my use case.

Yeah, 100% the case for me. I sometimes use it to do adversarial reviews on code that Opus wrote but the stuff it comes back with is total garbage more often than not. It just fabricates reasons as to why the code it's reviewing needs improvement.

reply
My tinfoil hat theory, which may not be that crazy, is that providers are sandbagging their models in the days leading up to a new release, so that the next model "feels" like a bigger improvement than it is.

An important aspect of AI is that it needs to be seen as moving forward all the time. Plateaus are the death of the hype cycle, and would tether people's expectations closer to reality.

reply
Possibly due to moving compute from inference to training
reply
My purely unfounded, gut reaction to Opus 4.7 being released today was "Oh, that explains the recent 4.6 performance - they were spinning up inference on 4.7."

Of course, I have no information on how they manage the deployment of their models across their infra.

reply
I was there too, but honestly after today, 4.7 "feels" just as a bad. I was cynical, but also, kind of eager for the improvement. It's just not there. Compared to early Feb, I have to babysit EVERYTHING.
reply
Codex really has its place in my bag. I mainly use it, rarely Claude.

Codex just gets it done. Very self-correcting by design while Claude has no real base line quality for me. Claude was awesome in December, but Codex is like a corporate company to me. Maybe it looks uncool, but can execute very well.

Also Web Design looks really smooth with Codex.

OpenAI really impressed me and continues to impress me with Codex. OpenAI made no fuzz about it, instead let results speak. It is as if Codex has no marketing department, just its product quality - kind of like Google in its early days with every product.

reply
I guess our conscience of OpenAI working with the Department of War has an expiry date of 6 weeks.
reply
That number is generous, and is also a pretty decent lifespan for a socially-conscious gesture in 2026.
reply
Most people just want to use a tool that works. Not everything has to be a damn moral crusade.
reply
Yes, let take morality out of our daily lives as much as possible... That seems like a great categorical imperative and a recipe for social success
reply
There's nothing moral about Anthropic. Especially to those of us who are not American citizens and to which Dario's pronouncements about ethics apparently do not apply, as stated in his own press release.

To me it just looks like a big sanctimonious festival of hypocrisy.

reply
That's an incredibly uncharitable take on what I said. But that kind of proves my point.

Foist your morality upon everyone else and burden them with your specific conscience; sounds like a fun time.

reply
What is the charitable way to look at it then?
reply
How about assuming the positive intent of what I actually said? Not everything has to be a moral crusade. Let me use the tool without pushing your personal moral opinions on me.

The same person wringing their hands over OpenAI, buys clothing made from slave labor and wrote that comment using a device with rare earth materials gotten from slave labor. Why is OpenAI the line? Why are they allowed to "exploit people" and I'm not?

Taken to its logical conclusion it's silly. And instead of engaging with that, they deflect with oH yEaH lEtS hAvE nO mOrAlS which is clearly not what I'm advocating.

reply
My most charitable interpretation of what you are saying is: Two wrongs make a right. If others exploit people that makes it an acceptable thing for me to do. No one can criticize me for doing a bad thing because others also do bad things. Is that what you are saying?

I genuinely cannot see how to interpret it in a way that is positive.

reply
Yeah, why actually engage with moral issues when we can just defer to a status quo that happens to benefit me?
reply
"Not everything" - sure, but mass surveillance and autonomous killing are kind of big things to sweep under that rug no?
reply
We all liked the Terminator movies. Hopefully the stay as movies.
reply
deleted
reply
I quoted 2 weeks at the time. I think even that was generous.
reply
Thing is that Anthropic was always working with DoD, too, and the line in the sand they drew looked really noble until I found it didn't not apply to me, a non-US citizen. Dario made it clear that was the case.

And so the difference, to me, was irrelevant. I'll buy based on value, and keep a poker in the fire of Chinese & European open weight models, as well.

reply
neah, I believe most people here, which immediately brag about codex, are openai employees doing part of their job. otherwise I couldn't possibly phantom why would anyone use codex. In my company 80% is claude and 15% gemini. you can barely see openai on the graph. and we have >5k programmers using ai every day.
reply
I’m thinking the same thing, Codex literally ruined the codebases that I experimented with it on.
reply
Currently GPT just works much better, and so does Gemini but it's more expensive right now. Going through Opencode stats, their claim is that Gemini is the current best model followed by GPT 5.4 on their benchmarks, but the difference is slim.

My personal experience is best with GPT but it could be the specific kind of work I use it for which is heavy on maths and cpp (and some LISP).

reply
OpenAI replaced its founding engineers with Meta PMs. The shift towards consumer engagement metrics and marketing is apparent.
reply
You can believe whatever you want. I found claude unusable due to limits. Codex works very well for my use cases.
reply
deleted
reply
Not everyone is American, and people who are not see Anthropic state they are willing to spy on our countries and shrug about OAI saying the same about America. What’s the difference to us?
reply
if you're not american you should be worried about the bit of using AI to kill people which was the other major objection by Anthropic.

(not that I think the US DoD wouldn't do that anyway, ToS or not.)

reply
well, if they put in a fully automated kill chain, its gonna be weak to attacks to make yourself look like a car, or a video game styled "hide under a box"

the current non-automated kill chain has targeted fishermen and a girl's school. Nobody is gonna be held accountable for either.

Am i worried about the killing or the AI? If i'm worried about the killing, id much rather push for US demilitarization.

reply
Anthropic's issue was only that the AI isn't yet good enough to tell who's an American, so it avoids killing them. They were fine with the "killing non-Americans" bit.
reply
OK, I am worried.

Now, what can I actually do?

reply
Vote with your dollar. Ask others to do the same and explain why. If we all did this, it might matter. There’s not a lot else an individual can do.
reply
Dario in fact said it was ok to spy and drone non-US citizens, and in fact endorsed American foreign policy generally.

So, no, I'm not voting with my wallet for one American country versus the other. I'll pick the best compromise product for me, and then also boost non-American R&D where I can.

reply
Vote with your wallet, just like Americans.
reply
Not only is Anthropic perfectly happy to let the DoD use their products to kill people, but they are partners with Palantir and were apparently instrumental in the strikes against Iran by the US military.

https://www.washingtonpost.com/technology/2026/03/04/anthrop...

So uh, yeah, the only difference I see between OAI and Anthropic is that one is more honest about what they’re willing to use their AI for.

reply
Longer than how long anyone cared about epstein.
reply
I've been using it with `/effort max` all the time, and it's been working better than ever.

I think here's part of the problem, it's hard to measure this, and you also don't know in which AB test cohorts you may currently be and how they are affecting results.

reply
Agree. I keep effort max on Claude and xhigh on GPT for all tasks and keep tasks as scoped units of work instead of boil the ocean type prompts. It is hard to measure but ultimately the tasks are getting completed and I'm validating so I consider it "working as expected".
reply
It works better, until you run out of tokens. Running out of tokens is something that used to never happen to me, but this month now regularly happens.

Maybe I could avoid running out of tokens by turning off 1M tokens and max effort, but that's a cure worse than the disease IMO.

reply
I would risk a guess that people have a wrong intuition about the long-context pricing and are complaining because of that.

Yeah, the per-token price stays the same, even with large context. But that still means that you're spending 4x more cache-read tokens in a 400k context conversation, on each turn, than you would be in a 100k context conversation.

reply
Personally I find using and managing Claude sessions and limits is getting exhausting and feels similar to calorie counting. You think you are going to have an amazing low calories meal only to realize the meal is full of processed sugars and you overshot the limit within 2-3 bites. Now "you have exhausted your limit for this time. Your session limits resets in next 4 hrs".
reply
Yep, it just feels terrible, the usage bars give me anxiety, and I think that's in their interest as they definitely push me towards paying for higher limits. Won't do that, though.
reply
Until the next time they push you back to Claude. At this point, I feel like this has to be the most unstable technology ever released. Imagine if docker had stopped working every two releases
reply
There is zero cost to switching ai models. Paid or open source. It's one line mostly.
reply
What about your chat history? That has some value, at least for me. But what has even more value is stable releases.
reply
You can output it as a memory using a simple prompt. You could probably re-use this prompt for any product with only slight modification. Or you could prompt the product to output an import prompt that is more tuned to its requirements.

e.g. https://claude.com/import-memory

reply
This is one of the many reasons I don't think the model companies are going to win the application space in coding.

There's literally zero context lost for me in switching between model providers as a cursor user at work. For personal stuff I'll use an open source harness for the same reason.

reply
I don't see any value in chat history. I delete all conversations at least weekly, it feels like baggage.
reply
I think this is more about which model you steer your coding harness to. You can also self-host a UI in front of multiple models, then you own the chat history.
reply
for me there is zero value there.
reply
Codex doesn't read Claude.md like Claude does. It's not a "one line" change to switch.
reply
I have a CLAUDE.md symlinked to AGENTS.md
reply
ln -s CLAUDE.md AGENTS.md

There's your one line change.

reply
That doesn't handle Claude.md in subdirectories. It does handle Claude.md and other various settings in .claude.
reply
You mean Anthropic are the only ones refusing the de-facto standard despite a long-standing issue: https://github.com/anthropics/claude-code/issues/6235

And as others have said, it's a one-line fix. "Skills" etc. are another `ln -s`

reply
I don't have much quality drop from 4.6. But I also notice that I use codex more often these days than claude code
reply
It's been shockingly bad for me - for another example when asked to make a new python script building off an existing one; for some cursed reason the model choose to .read() the py files, use 100 of lines of regex to try to patch the changes in, and exec'd everything at the end...
reply
Hate that about Claude Code. I have been adding permissions for it to do everything that makes sense to add when it comes to editing files, but way too often it will generate 20-30 line bash snippets using sed to do the edits instead, and then the whole permission system breaks down. It means I have to babysit it all the time to make sure no random permission prompts pop up.
reply
I generally think codex is doing well until I come in with my Opus sweep to clean it up. Claude just codes closer to the way my brain works. codex is great at finding numerical stability issues though and increasingly I like that it waits for an explicit push to start working. But talking to Claude Code the way I learned to talk to codex seems to work also so I think a lot of it is just learning curve (for me).
reply
Usually the problems that cause this kind of thing are:

1) Bad prompt/context. No matter what the model is, the input determines the output. This is a really big subject as there's a ton of things you can do to help guide it or add guardrails, structure the planning/investigation, etc.

2) Misaligned model settings. If temperature/top_p/top_k are too high, you will get more hallucination and possibly loops. If they're too low, you don't get "interesting" enough results. Same for the repeat protection settings.

I'm not saying it didn't screw up, but it's not really the model's fault. Every model has the potential for this kind of behavior. It's our job to do a lot of stuff around it to make it less likely.

The agent harness is also a big part of it. Some agents have very specific restrictions built in, like max number of responses or response tokens, so you can prevent it from just going off on a random tangent forever.

reply
so even with a new tokenizer that can map to more tokens than before, their answer is still just "you're not managing your context well enough"

"Opus 4.7 uses an updated tokenizer that [...] can map to more tokens—roughly 1.0–1.35× depending on the content type.

[...]

Users can control token usage in various ways: by using the effort parameter, adjusting their task budgets, or prompting the model to be more concise."

reply
That's wild that you think 4.6 is bad..... Each model has its strengths and weaknesses I find that Codex is good for architectural design and Claude Is actually better the engineering and building
reply
We've started calling it dopus at work :(
reply
I enjoy switching back and forth and having multi-agent reviews. I'm enjoying Codex also but having options is the real win.
reply
deleted
reply
For me, making it high effort just fixed all the quality problems, and even cut down on token use somehow
reply
This. They kind of snuck this into the release notes: switching the default effort level to Medium. High is significantly slower, but that’s somewhat mitigated by the fact that you don’t have to constantly act like a helicopter parent for it.
reply
Yup, they recommend a minimum of high for coding now, and cranked the default up to extra high.
reply
I do feel that CC sometimes starts doing dumb tasks or asking for approval for things that usually don’t really need it. Like extra syntax checks, or some greps/text parsing basic commands
reply
Exactly. Why do they ask permission for read-only operations?! You either run with --dangerously-skip-permissions or you come back after 30 minutes to find it waiting for permission to run grep. There's no middle ground, at least not that Claude CLI users have access to.
reply
Before opus released we also saw huge backlash with it being dumber.

Perhaps they need the compute for the training

reply
deleted
reply
Strange. Opus 4.6 has been great for me. On Max 20x
reply
I've noticed the same over the last two weeks. Some days Claude will just entirely lose its marbles. I pay for Claude and Codex so I just end up needing to use codex those days and the difference is night and day.
reply
Same! I thought people were exaggerating how bad Claude has gotten until it deleted several files by accident yesterday

Codex isn’t as pretty in output but gets the job done much more consistently

reply
Meh. At $work we were on CC for one month, then switched to Codex for one month, and now will be on CC again to test. We haven’t seen any obvious difference between CC and Codex; both are sometimes very good and sometimes very stupid. You have to test for a long time, not just test one day and call it a benchmark just because you have a single example.
reply
I've been raging pretty hard too. Thought either I'm getting cleverer by the day or Claude has been slipping and sliding toward the wrong side of the "smart idiot" equation pretty fast.

Have caught it flat-out skipping 50% of tasks and lying about it.

reply
codex low-key seems to be better than claude. and i say this as an 18-hour-a-day user of both (mostly claude)
reply
Anecdotally, codex has been burning through way more tokens for me lately. Claude seems to just sit and spin for a long time doing nothing, but at least token use is moderate.

All options are starting to suck more and more

reply
Same for me.

I cancelled my subscription and will be moving to Codex for the time being.

Tokens are way too opaque and Claude was way smarter for my work a couple of months ago.

reply
I try codex, but i hate 5.4's personality as a partner. It's a demon debugger though. but working closely with it, it's so smug and annoying.
reply
How do you get codex to generate any code?

I describe the problem and codex runs in circles basically:

codex> I see the problem clearly. Let me create a plan so that I can implement it. The plan is X, Y, Z. Do you want me to implement this?

me> Yes please, looks good. Go ahead!

codex> Okay. Thank you for confirming. So I am going to implement X, Y, Z now. Shall I proceeed?

me> Yes, proceed.

codex> Okay. Implementing.

...codex is working... you see the internal monologue running in circles

codex> Here is what I am going to implement: X, Y, Z

me> Yes, you said that already. Go ahead!

codex> Working on it.

...codex in doing something...

codex> After examining the problem more, indeed, the steps should be X, Y, Z. Do you want me to implement them?

etc.

Very much every sessions ends up being like this. I was unable to get any useful code apart from boilerplate JS from it since 5.4

So instead I just use ChatGPT to create a plan and then ask Opus to code, but it's a hit and miss. Almost every time the prompt seems to be routed to cheaper model that is very dumb (but says Opus 4.6 when asked). I have to start new session many times until I get a good model.

reply
It's just like subscription based MMORPGs that delay you as much as possible every step of the way because that's the way they can extract more money from you. If you pay for the tokens it's not in their benefit to give you the answer directly.
reply
Do you have to put it in a build/execute mode (separate from a planning mode) to allow it to move on? I use opencode, and that's how it works.
reply
Weird. I never had that issue when writing code.
reply
Yep, I'll wait for the GPT answer to this. If we're lucky OpenAI will release a new GPT 5.5 or whatever model in the next few days, just like the last round.

I have been getting better results out of codex on and off for months. It's more "careful" and systematic in its thinking. It makes less "excuses" and leaves less race conditions and slop around. And the actual codex CLI tool is better written, less buggy and faster. And I can use the membership in things like opencode etc without drama.

For March I decided to give Claude Code / Opus a chance again. But there's just too much variance there. And then they started to play games with limits, and then OpenAI rolled out a $100 plan to compete with Anthropic's.

I'm glad to see the competition but I think Anthropic has pissed in the well too much. I do think they sent me something about a free month and maybe I will use that to try this model out though.

reply
I’ve been on the Claude Code train for a while but decided to try Codex last week after they announced the $100 USD Pro plan.

I’ve been pretty happy with it! One thing I immediately like more than Claude is that Codex seems much more transparent about what it’s thinking and what it wants to do next. I find it much easier to interrupt or jump in the middle if things are going to wrong direction.

Claude Code has been slowly turning into this mysterious black box, wiping out terminal context any time it compacts a conversation (which I think is their hacky way of dealing with terminal flickering issues — which is still happening, 14 months later), going out of the way to hide thought output, and then of course the whole performance issues thing.

Excited to try 4.7 out, but man, Codex (as a harness at least) is a stark contrast to Claude Code.

reply
> One thing I immediately like more than Claude is that Codex seems much more transparent about what it’s thinking and what it wants to do next. I find it much easier to interrupt or jump in the middle if things are going to wrong direction.

I've finally started experimenting recently with Claude's --dangerously-skip-permissions and Codex's --dangerously-bypass-approvals-and-sandbox through external sandboxing tools. (For now just nono¹, which I really like so far, and soon via containerization or virtual machines.)

When I am using Claude or Codex without external sandboxing tools and just using the TUI, I spend a lot of time approving individual commands. When I was working that way, I found Codex's tendency to stop and ask me whether/how it should proceed extremely annoying. I found myself shouting at my monitor, "Yes, duh, go do the thing!".

But when I run these tools without having them ask me for permission for individual commands or edits, I sometimes find Claude has run away from me a little and made the wrong changes or tried to debug something in a bone-headed way that I would have redirected with an interruption if it has stopped to ask me for permissions. I think maybe Codex's tendency to stop and check in may be more valuable if you're relying on sandboxing (external or built-in) so that you can avoid individual permissions prompts.

--

1: https://nono.sh/

reply
There is a new flag for terminal flickering issues:

> Claude Code v2.1.89: "Added CLAUDE_CODE_NO_FLICKER=1 environment variable to opt into flicker-free alt-screen rendering with virtualized scrollback"

reply
Such an interesting choice for a flag name. NO_BUG_PLEASE=1
reply
there is an official codex plugin for claude. I just have them do adversarial reviews/implementations. etc with each other. adds a bit of time to the workflow but once you have the permissions sorted it'll just engage codex when necessary
reply
Do this -- take your coworker's PRs that they've clearly written in Claude Code, and have Codex/GPT 5.4 review them.

Or have Codex review your own Claude Code work.

It then becomes clear just how "sloppy" CC is.

I wouldn't mind having Opus around in my back pocket to yeet out whole net new greenfield features. But I can't trust it to produce well-engineered things to my standards. Not that anybody should trust an LLM to that level, but there's matters of degree here.

reply
I've been using Claude and Codex in tandem ($100 CC, $20 Codex), and have made heavy use of claude-co-commands [0] to make them talk. Outside of the last 1-2 weeks (which we now have confirmation YET AGAIN that Claude shits the fucking bed in the run-up to a new model release), I usually will put Claude on max + /plan to gin up a fever dream to implement. When the plan is presented, I tell it to /co-validate with Codex, which tends to fill in many implementation gaps. Claude then codes the amended plan and commits, then I have a Codex skill that reviews the commit for gaps, missed edge cases, incorrect implementation, missed optimizations, etc, and fix them. This had been working quite well up until the beginning of the month, Claude more or less got CTE, and after a week of that I swapped to $100 Codex, $20 CC plans. Now I'm using co-validation a lot less and just driving primarily via Codex. When Claude works, it provides some good collaborative insights and counter-points, but Codex at the very least is consistently predictable (for text-oriented, data-oriented stuff -- I don't use either for designing or implementing frontend / UI / etc).

As always, YMMV!

[0] https://github.com/SnakeO/claude-co-commands

reply
Some variation of this is the way.

You should not get dependent on one black box. Companies will exploit that dependency.

My version of this is having CC Pro, Cursor Pro, and OpenCode (with $10 to Codex/GLM 5.1) --> total $50. My work doesn't stop if one of these is having overloaded servers, etc. And it's definitely useful to have them cross-checking each other's plans and work.

reply
This more or less mimics a flow that I had fairly good results from -- but I'm unwilling to pay for both right now unless I had a client or employer willing to foot the bill.

Claude Code as "author" and a $20 Codex as reviewer/planner/tester has worked for me to squeeze better value out of the CC plan. But with the new $100 codex plan, and with the way Anthropic seemed to nerf their own $100 plan, I'm not doing this anymore.

reply
> It then becomes clear just how "sloppy" CC is.

Have you done the reverse? In my experience models will always find something to criticize in another model's work.

reply
I have, and in fact models will find things to criticize in their own work, too, so it's good to iterate.

But I've had the best results with GPT 5.4

reply
It cuts both ways. What I usually do these days is to let codex write code, then use claude code /simplify, have both codex and claude code review the PR, then finally manually review and fixup things myself. It's still ~2x faster than doing everything by myself.
reply
I often work this way too, but I'll say this:

This flow is exhausting. A day of working this way leaves me much more drained than traditional old school coding.

reply
100%. On days when I'm sleep deprived (once or twice a week), I fallback to this flow. On regular days, I tend to write more code the old school way and use things things for review.
reply
What bothers me with codex cli is that it feels like it should be more observable, more open and verbose about what the model is doing per step, being an open source product and OpenAI seemingly being actually open for once, but then it does a tool call - "Read $file" and I have no idea whether it read the entire file, or a specific chunk of it. Claude cli shows you everything model is doing unless it's in a subagent (which is why I never use subagents).
reply