undefined

upvote

points

by matheusmoreira6 hours ago |

upvote

by riskassessment5 hours ago|

[-]

Stealthily degrade the model or stealthily constrain the model with a tighter harness? These coding tools like Claude Code were created to overcome the shortcomings of last year's models. Models have gotten better but the harnesses have not been rebuilt from scratch to reflect improved planning and tool use inherent to newer models.

I do wonder how much all the engineering put into these coding tools may actually in some cases degrade coding performance relative to simpler instructions and terminal access. Not to mention that the monthly subscription pricing structure incentivizes building the harness to reduce token use. How much of that token efficiency is to the benefit of the user? Someone needs to be doing research comparing e.g. Claude Code vs generic code assist via API access with some minimal tooling and instructions.

reply

upvote

by nrds5 hours ago|

[-]

I've been using pi.dev since December. The only significant change to the harness in that time which affects my usage is the availability of parallel tool calls. Yet Claude models have become unusable in the past month for many of the reasons observed here. Conclusion: it's not the harness.

I tend to agree about the legacy workarounds being actively harmful though. I tried out Zed agent for a while and I was SHOCKED at how bad its edit tool is compared to the search-and-replace tool in pi. I didn't find a single frontier model capable of using it reliably. By forking, it completely decouples models' thinking from their edits and then erases the evidence from their context. Agents ended up believing that a less capable subagent was making editing mistakes.

reply

upvote

by 41 minutes ago|

[-]

deleted

reply

upvote

by copperx3 hours ago|

[-]

Are you using Pi with a cloud subscription, or are you using the API?

reply

upvote

by jfim4 hours ago|

[-]

Out of curiosity, what can parallel tool calls do that one can't do with parallel subagents and background processes?

reply

upvote

by NooneAtAll33 hours ago|

[-]

I feel like "feature/model freeze" may be justified

just call it something like "[month][year]edition" and work on next release

users spend effort arriving to narrow peak of performace, but every change keeps moving the peak sideways

reply

upvote

by muyuu19 minutes ago|

[-]

The changes to reduce inference costs are intentional. Last thing you're going to do is have users linger on an older version that spends much more. This is essentially what's going on with layers upon layers of social engineering on top of it.

reply

upvote

by jmount5 hours ago|

[-]

Love your point. Instructions found to be good by trial and error for one LLM may not be good for another LLM.

reply

upvote

by lelanthran4 hours ago|

[-]

> Love your point. Instructions found to be good by trial and error for one LLM may not be good for another LLM.

Well, according to this story, instructions refined by trial and error over months might be good for one LLM on Tuesday, and then be bad for the same LLM on Wednesday.

reply

upvote

by robwwilliams5 hours ago|

[-]

Agree: it is Anthropic's aggressive changes to the harnesses and to the hidden base prompt we users do not see. Clearly intended to give long right tail users a haircut.

reply

upvote

by mikepurvis5 hours ago|

[-]

Disconcerting for sure, but from a business point of view you can understand where they're at; afaiui they're still losing money on basically every query and simultaneously under huge pressure to show that they can (a) deliver this product sustainably at (b) a price point that will be affordable to basically everyone (eg, similar market penetration to smartphones).

The constraints of (b) limit them from raising the price, so that means meeting (a) by making it worse, and maybe eventually doing a price discrimination play with premium tiers that are faster and smarter for 10x the cost. But anything done now that erodes the market's trust in their delivery makes that eventual premium tier a harder sell.

reply

upvote

by willis9365 hours ago|

[-]

They'll never get anyone on board if the product can't be trusted to not suck.

And idk about the pricing thing. Right now I waste multiple dollars on a 40 minute response that is useless. Why would I ever use this product?

reply

upvote

by matheusmoreira4 hours ago|

[-]

Yeah. I've been enjoying programming with Claude so much I started feeling the need to upgrade to Max. Then it turns out even big companies paying API premiums are getting an intentionally degraded and inferior model. I don't want to pay for Opus if I can't trust what it says.

reply

upvote

by the__alchemist5 hours ago|

[-]

ChatGPT has been doing the same consistently for years. Model starts out smooth, takes a while, and produces good (relatively) results. Within a few weeks, responses start happening much more quickly, at a poorer quality.

reply

upvote

by beering4 hours ago|

[-]

people have been complaining about this since GPT-4 and have never been able to provide any evidence (even though they have all their old conversations in their chat history). I think it’s simply new model shininess turning into raised expectations after some amount of time.

reply

upvote

by gherkinnn33 minutes ago|

[-]

I would have thought so too. But my n=1 has CC solving pretty much the same task today and about two weeks ago with drastically degraded results.

The background being that we scrapped working on a feature and then started again a sprint later.

In my cynicism I find it more likely that a massively unprofitable LLM company tries to reduce costs at any price than everyone else suffering from a collective delusion.

reply

upvote

by quietsegfault4 hours ago|

[-]

I agree with you. I too complain about this same phenomenon with my colleagues, and we always arrive at the same conclusion: it’s probably us just expecting more and more over time.

reply

upvote

by ambicapter5 hours ago|

[-]

First time interacting with a corporation in America?

reply

upvote

by matheusmoreira5 hours ago|

[-]

With an AI corporation, yes. I subscribed during the promotional 2x usage period. Anthropic's reputation as a more ethical alternative to OpenAI factored heavily in that decision. I'm very disappointed.

reply

upvote

by satvikpendem3 hours ago|

[-]

Ethics don't mean anything when talking about corporations. Their good guy persona is itself a marketing stunt.

https://news.ycombinator.com/item?id=47633396#47635060

reply

upvote

by quikoa1 hours ago|

[-]

Perhaps the subscription part of the business is so heavily subsidized that they have no choice but to reduce the cost.

reply

upvote

by vitaflo1 hours ago|

[-]

Or they don’t have enough compute to handle the recent influx of traffic. I’m guessing it’s a bit of both.

reply

upvote

by nyeah5 hours ago|

[-]

It's disconcerting. But in 2026 it's not very surprising.

reply

upvote

by redhed5 hours ago|

[-]

It seems likely to me they are moving compute power to the new models they are creating,

reply

upvote

by 01284a7e6 hours ago|

[-]

Seems like the logical conclusion, no matter what.

reply

upvote

by tmpz225 hours ago|

[-]

> effectively pulling the rug from under their customers.

This is the whole point of AI. Its a black box that they can completely control.

reply

upvote

by matheusmoreira5 hours ago|

[-]

I hope local models advance to the point they can match Opus one day...

reply

upvote

by zozbot2344 hours ago|

[-]

If OP is correct, Opus has regressed to a point where local models are already on par with it.

reply

upvote

by NinjaTrance4 hours ago|

[-]

Considering the advances in software and hardware, I would expect that in 2 or 3 years.

And I hope we will eventually reach a point where models become "good enough" for certain tasks, and we won't have to replace them every 6 months.

(That would be similar to the evolution of other technologies like personal computers and smartphones.)

reply

upvote

by addandsubtract5 hours ago|

[-]

We said this since ChatGPT 3. People will never be content with local models.

reply

upvote

by SpicyLemonZest5 hours ago|

[-]

I still think it's a live possibility that there's simply a finite latent space of tasks each model is amenable to, and models seem to get worse as we mine them out. (The source link claims this is associated with "the rollout of thinking content redaction", but also that observable symptoms began before that rollout, so I wouldn't particularly trust its diagnosis even without the LLM psychosis bit at the end.)

reply

upvote

by NinjaTrance5 hours ago|

[-]

[dead]

reply

upvote

by halfcat5 hours ago|

[-]

If you think that’s brutal, wait until you hear about how fiat currency works

reply