undefined

points

[-]

> 2. Old sessions had the thinking tokens stripped, resuming the session made Claude stupid (took 15 days to notice and remediate)

This one was egregious: after a one hour user pause, apparently they cleared the cache and then continued to apply “forgetting” for the rest of the session after the resume!

Seems like a very basic software engineering error that would be caught by normal unit testing.

by Eridrus19 hours ago|

prev|

[-]

To be fair to Anthropic, they did not intentionally degrade performance.

To take the opposite side, this is the quality of software you get atm when your org is all in on vibe coding everything.

by shrx17 hours ago|

parent|

[-]

Are you saying dropping cache after 1 hour is not intentionally degrading performance?

by Eridrus11 hours ago|

parent|

[-]

Yes. Caching is a cost optimization not a response quality metric.

by shrx4 hours ago|

parent|

[-]

But it still degrades performance.

by Eridrus43 minutes ago|

parent|

[-]

It's unfortunate that the word performance is overloaded and ML folks have a specific definition..that isn't what the rest of CS uses, but I understand Anthropic to mean response quality when they say this and not any other dimension you could measure performance on.

You can argue they're lying, but I think this is just folks misunderstanding what Anthropic is saying.

by sroussey20 hours ago|

prev|

[-]

None of these problems equate to degrading model performance. Completely different team. Degraded CC harness, sure.

by qingcharles19 hours ago|

parent|

[-]

Sure, but it gives the impression of degraded model performance. Especially when the interface is still saying the model is operating on "high", the same as it did yesterday, yet it is in "medium" -- it just looks like the model got hobbled.

by sroussey19 hours ago|

parent|

[-]

Oh, absolutely. Though changes in how the model is used is imminently more fixable than the model itself.

by johnmaguire19 hours ago|

parent|

prev|

[-]

Yes, but for many users, CC is the product. Especially since I'm not allowed(?) to use my own harness with my sub.

by 20 hours ago|

prev|

[-]

deleted

by Philpax19 hours ago|

prev|

[-]

> Anthropic publicly gaslights their user-base: "we never degrade model performance" is frustrating.

They're not gaslighting anyone here: they're very clear that the model itself, as in Opus 4.7, was not degraded in any way (i.e. if you take them at their word, they do not drop to lower quantisations of Claude during peak load).

However, the infrastructure around it - Claude Code, etc - is very much subject to change, and I agree that they should manage these changes better and ensure that they are well-communicated.

by jryio19 hours ago|

parent|

[-]

Model performance at inference in a data center v.s. stripping thinking tokens are effectively the same.

Sure they didn't change the GPUs their running, or the quantization, but if valuable information is removed leading to models performing worse, performance was degraded.

In the same way uptime doesn't care about the incident cause... if you're down you're down no one cares that it was 'technically DNS'.

by sroussey19 hours ago|

parent|

[-]

I thought these days thinking tokens sent my the model (as opposed to used internally) were just for the users benefit. When you send the convo back you have to strip the thinking stuff for next turn. Or is that just local models?

by aszen19 hours ago|

parent|

prev|

[-]

Claude code is not infra, the model is the infra. They changed settings to make their models faster and probably cheaper to run too. Honestly with adaptive thinking it no longer matters what model it is if you can dynamically make it do less or more work.