upvote
> 2. Old sessions had the thinking tokens stripped, resuming the session made Claude stupid (took 15 days to notice and remediate)

This one was egregious: after a one hour user pause, apparently they cleared the cache and then continued to apply “forgetting” for the rest of the session after the resume!

Seems like a very basic software engineering error that would be caught by normal unit testing.

reply
To be fair to Anthropic, they did not intentionally degrade performance.

To take the opposite side, this is the quality of software you get atm when your org is all in on vibe coding everything.

reply
Are you saying dropping cache after 1 hour is not intentionally degrading performance?
reply
Yes. Caching is a cost optimization not a response quality metric.
reply
But it still degrades performance.
reply
It's unfortunate that the word performance is overloaded and ML folks have a specific definition..that isn't what the rest of CS uses, but I understand Anthropic to mean response quality when they say this and not any other dimension you could measure performance on.

You can argue they're lying, but I think this is just folks misunderstanding what Anthropic is saying.

reply
None of these problems equate to degrading model performance. Completely different team. Degraded CC harness, sure.
reply
Sure, but it gives the impression of degraded model performance. Especially when the interface is still saying the model is operating on "high", the same as it did yesterday, yet it is in "medium" -- it just looks like the model got hobbled.
reply
Oh, absolutely. Though changes in how the model is used is imminently more fixable than the model itself.
reply
Yes, but for many users, CC is the product. Especially since I'm not allowed(?) to use my own harness with my sub.
reply
deleted
reply
> Anthropic publicly gaslights their user-base: "we never degrade model performance" is frustrating.

They're not gaslighting anyone here: they're very clear that the model itself, as in Opus 4.7, was not degraded in any way (i.e. if you take them at their word, they do not drop to lower quantisations of Claude during peak load).

However, the infrastructure around it - Claude Code, etc - is very much subject to change, and I agree that they should manage these changes better and ensure that they are well-communicated.

reply
Model performance at inference in a data center v.s. stripping thinking tokens are effectively the same.

Sure they didn't change the GPUs their running, or the quantization, but if valuable information is removed leading to models performing worse, performance was degraded.

In the same way uptime doesn't care about the incident cause... if you're down you're down no one cares that it was 'technically DNS'.

reply
I thought these days thinking tokens sent my the model (as opposed to used internally) were just for the users benefit. When you send the convo back you have to strip the thinking stuff for next turn. Or is that just local models?
reply
Claude code is not infra, the model is the infra. They changed settings to make their models faster and probably cheaper to run too. Honestly with adaptive thinking it no longer matters what model it is if you can dynamically make it do less or more work.
reply