upvote
Stealthily degrade the model or stealthily constrain the model with a tighter harness? These coding tools like Claude Code were created to overcome the shortcomings of last year's models. Models have gotten better but the harnesses have not been rebuilt from scratch to reflect improved planning and tool use inherent to newer models.

I do wonder how much all the engineering put into these coding tools may actually in some cases degrade coding performance relative to simpler instructions and terminal access. Not to mention that the monthly subscription pricing structure incentivizes building the harness to reduce token use. How much of that token efficiency is to the benefit of the user? Someone needs to be doing research comparing e.g. Claude Code vs generic code assist via API access with some minimal tooling and instructions.

reply
I've been using pi.dev since December. The only significant change to the harness in that time which affects my usage is the availability of parallel tool calls. Yet Claude models have become unusable in the past month for many of the reasons observed here. Conclusion: it's not the harness.

I tend to agree about the legacy workarounds being actively harmful though. I tried out Zed agent for a while and I was SHOCKED at how bad its edit tool is compared to the search-and-replace tool in pi. I didn't find a single frontier model capable of using it reliably. By forking, it completely decouples models' thinking from their edits and then erases the evidence from their context. Agents ended up believing that a less capable subagent was making editing mistakes.

reply
deleted
reply
Are you using Pi with a cloud subscription, or are you using the API?
reply
Out of curiosity, what can parallel tool calls do that one can't do with parallel subagents and background processes?
reply
I feel like "feature/model freeze" may be justified

just call it something like "[month][year]edition" and work on next release

users spend effort arriving to narrow peak of performace, but every change keeps moving the peak sideways

reply
The changes to reduce inference costs are intentional. Last thing you're going to do is have users linger on an older version that spends much more. This is essentially what's going on with layers upon layers of social engineering on top of it.
reply
Love your point. Instructions found to be good by trial and error for one LLM may not be good for another LLM.
reply
> Love your point. Instructions found to be good by trial and error for one LLM may not be good for another LLM.

Well, according to this story, instructions refined by trial and error over months might be good for one LLM on Tuesday, and then be bad for the same LLM on Wednesday.

reply
Agree: it is Anthropic's aggressive changes to the harnesses and to the hidden base prompt we users do not see. Clearly intended to give long right tail users a haircut.
reply
Disconcerting for sure, but from a business point of view you can understand where they're at; afaiui they're still losing money on basically every query and simultaneously under huge pressure to show that they can (a) deliver this product sustainably at (b) a price point that will be affordable to basically everyone (eg, similar market penetration to smartphones).

The constraints of (b) limit them from raising the price, so that means meeting (a) by making it worse, and maybe eventually doing a price discrimination play with premium tiers that are faster and smarter for 10x the cost. But anything done now that erodes the market's trust in their delivery makes that eventual premium tier a harder sell.

reply
They'll never get anyone on board if the product can't be trusted to not suck.

And idk about the pricing thing. Right now I waste multiple dollars on a 40 minute response that is useless. Why would I ever use this product?

reply
Yeah. I've been enjoying programming with Claude so much I started feeling the need to upgrade to Max. Then it turns out even big companies paying API premiums are getting an intentionally degraded and inferior model. I don't want to pay for Opus if I can't trust what it says.
reply
ChatGPT has been doing the same consistently for years. Model starts out smooth, takes a while, and produces good (relatively) results. Within a few weeks, responses start happening much more quickly, at a poorer quality.
reply
people have been complaining about this since GPT-4 and have never been able to provide any evidence (even though they have all their old conversations in their chat history). I think it’s simply new model shininess turning into raised expectations after some amount of time.
reply
I would have thought so too. But my n=1 has CC solving pretty much the same task today and about two weeks ago with drastically degraded results.

The background being that we scrapped working on a feature and then started again a sprint later.

In my cynicism I find it more likely that a massively unprofitable LLM company tries to reduce costs at any price than everyone else suffering from a collective delusion.

reply
I agree with you. I too complain about this same phenomenon with my colleagues, and we always arrive at the same conclusion: it’s probably us just expecting more and more over time.
reply
First time interacting with a corporation in America?
reply
With an AI corporation, yes. I subscribed during the promotional 2x usage period. Anthropic's reputation as a more ethical alternative to OpenAI factored heavily in that decision. I'm very disappointed.
reply
Ethics don't mean anything when talking about corporations. Their good guy persona is itself a marketing stunt.

https://news.ycombinator.com/item?id=47633396#47635060

reply
Perhaps the subscription part of the business is so heavily subsidized that they have no choice but to reduce the cost.
reply
Or they don’t have enough compute to handle the recent influx of traffic. I’m guessing it’s a bit of both.
reply
It's disconcerting. But in 2026 it's not very surprising.
reply
It seems likely to me they are moving compute power to the new models they are creating,
reply
Seems like the logical conclusion, no matter what.
reply
> effectively pulling the rug from under their customers.

This is the whole point of AI. Its a black box that they can completely control.

reply
I hope local models advance to the point they can match Opus one day...
reply
If OP is correct, Opus has regressed to a point where local models are already on par with it.
reply
Considering the advances in software and hardware, I would expect that in 2 or 3 years.

And I hope we will eventually reach a point where models become "good enough" for certain tasks, and we won't have to replace them every 6 months.

(That would be similar to the evolution of other technologies like personal computers and smartphones.)

reply
We said this since ChatGPT 3. People will never be content with local models.
reply
I still think it's a live possibility that there's simply a finite latent space of tasks each model is amenable to, and models seem to get worse as we mine them out. (The source link claims this is associated with "the rollout of thinking content redaction", but also that observable symptoms began before that rollout, so I wouldn't particularly trust its diagnosis even without the LLM psychosis bit at the end.)
reply
[dead]
reply
If you think that’s brutal, wait until you hear about how fiat currency works
reply