undefined

upvote

points

by btown20 hours ago |

upvote

by CjHuber20 hours ago|

[-]

I think it’s crazy that they do this, especially without any notice. I would not have renewed my subscription if I knew that they started doing this.

Especially in the analysis part of my work I don‘t care about the actual text output itself most of the time but try to make the model „understand“ the topic.

In the first phase the actual text output itself is worthless it just serves as an indicator that the context was processed correctly and the future actual analysis work can depend on it. And they‘re… just throwing most the relevant stuff out all out without any notice when I resume my session after a few days?

This is insane, Claude literally became useless to me and I didn’t even know it until now, wasting a lot of my time building up good session context.

There would be nothing lost if they said „If you click yes, we will prune your old thinking making Claude faster and saving you tons of tokens“. Most people would say yes probably so why not ask them… make it an env variable (that is announced not a secretly introduced one to opt out of something new!) or at least write it in a change log if they really don’t want to allow people to use it like before, so there‘d be chance to cancel the subscription in time instead of wasting tons of time on work patterns that not longer work

reply

upvote

by munk-a19 hours ago|

[-]

Pointing at their terms of service will definitely be the instantly summoned defense (as would most modern companies) but the fact that SaaS can so suddenly shift the quality of product being delivered for their subscription without clear notification or explicitly re-enrollment is definitely a legal oversight right now and Italy actually did recently clamp down on Netflix doing this[1]. It's hard to define what user expectations of a continuous product are and how companies may have violated it - and for a long time social constructs kept this pretty in check. As obviously inactive and forgotten about subscriptions have become a more significant revenue source for services that agreement has been eroded, though, and the legal system has yet to catch up.

1. Specifically, this suite was about price increases without clear consideration for both parties - but the same justifications apply to service restrictions without corresponding price decreases.

https://fortune.com/2026/04/20/italian-court-netflix-refunds...

reply

upvote

by kiratp14 hours ago|

[-]

OpenAI does this for all API calls

> Our systems will smartly ignore any reasoning items that aren’t relevant to your functions, and only retain those in context that are relevant. You can pass reasoning items from previous responses either using the previous_response_id parameter, or by manually passing in all the output items from a past response into the input of a new one.

https://developers.openai.com/api/docs/guides/reasoning

Disclosure - work on AI@msft

reply

upvote

by jetbalsa19 hours ago|

[-]

So to defend a litte, its a Cache, it has to go somewhere, its a save state of the model's inner workings at the time of the last message. so if it expires, it has to process the whole thing again. most people don't understand that every message the ENTIRE history of the conversion is processed again and again without that cache. That conversion might of hit several gigs worth of model weights and are you expecting them to keep that around for /all/ of your conversions you have had with it in separate sessions?

reply

upvote

by 383629364819 hours ago|

[-]

No? It's not because it's a cache, it's because they're scared of letting you see the thinking trace. If you got the trace you could just send it back in full when it got evicted from the cache. This is how open weight models work.

reply

upvote

by mpyne17 hours ago|

[-]

The trace goes back fine, that's not the issue.

The issue is that if they send the full trace back, it will have to be processed from the start if the cache expired, and doing that will cause a huge one-time hit against your token limit if the session has grown large.

So what Boris talked about is stripping things out of the trace that goes back to regenerate the session if the cache expires. Doing this would help avert burning up the token limit, but it is technically a different conversation, so if CC chooses poorly on stripping parts of the context then it would lead to Claude getting all scatter-brained.

reply

upvote

by charcircuit8 hours ago|

[-]

>and doing that will cause a huge one-time hit against your token limit if the session has grown large.

Anthropic already profited from generating those tokens. They can afford subsidize reloading context.

reply

upvote

by reactordev19 hours ago|

[-]

They are sending it back to the cache, the part you are missing is they were charging you for it.

reply

upvote

by eknkc19 hours ago|

[-]

The blog post says they prune them now not to charge you. That’s the change they implemented.

reply

upvote

by reactordev18 hours ago|

[-]

right. they were charging you for it, now they aren't because they are just dropping your conversation history.

reply

upvote

by eknkc19 hours ago|

[-]

I’m not familiar with the Claude API but OpenAI has an encrypted thking messages option. You get something that you can send back but it is encrypted. Not available on Anthropic?

reply

upvote

by CjHuber18 hours ago|

[-]

No of course it’s unrealistic for them to hold the cache indefinitely and that’s not the point. You are keeping the session data yourself so you can continue even after cache expiry. The point I‘m making is that it made me very angry that without any announcement they changed behavior to strip the old thinking even when you have it in your session file. There is absolutely no reason to not ask the user about if they want this

And it’s part of a larger problem of unannounced changes it‘s just like when they introduced adaptive thinking to 4.6 a few weeks ago without notice.

Also they seem to be completely unaware that some users might only use Claude code because they are used to it not stripping thinking in contrast to codex.

Anyway I‘m happy that they saw it as a valid refund reason

reply

upvote

by rsfern19 hours ago|

[-]

It seems like an opportunity for a hierarchical cache. Instead of just nuking all context on eviction, couldn’t there be an L2 cache with a longer eviction time so task switching for an hour doesn’t require a full session replay?

reply

upvote

by cyanydeez18 hours ago|

[-]

what matters isn't that it's a cache; what matter is it's cached _in the GPU/NPU_ memory and taking up space from another user's active session; to keep that cache in the GPU is a nonstarter for an oversold product. Even putting into cold storage means they still have to load it at the cost of the compute, generally speaking because it again, takes up space from an oversold product.

reply

upvote

by 19 hours ago|

[-]

deleted

reply

upvote

by FireBeyond15 hours ago|

[-]

> There would be nothing lost if they said „If you click yes, we will prune your old thinking making Claude faster and saving you tons of tokens“. Most people would say yes probably so why not ask them

The irony is that Claude Design does this. I did a big test building a design system, and when I came back to it, it had in the chat window "Do you need all this history for your next block of work? Save 120K tokens and start a new chat. Claude will still be able to use the design system." Or words to that effect.

reply

upvote

by CjHuber14 hours ago|

[-]

This is exactly what also confused me. I had the exact same prompt in Claude code as well, and the no option implies you can also keep the whole history. But clicking keep apparently only ever kept the user and assistant messages not the whole actual thinking parts of the conversation

reply

upvote

by trinsic218 hours ago|

[-]

Why cant you just build a project document that outlines that prompt that you want to do? Or have claude save your progress in memory so you can pick it up later? Thats what I do. It seems abhorrent to expect to have a running prompt that left idle for long periods of time just so you can pick up at a moments whim...

reply

upvote

by Terretta17 hours ago|

[-]

You know that memory goes back into a prompt as context that wasn't cached, so... that just adds work.

Granted, the "memory" can be available across session, as can docs...

reply

upvote

by try-working16 hours ago|

[-]

recursive-mode does just that: https://recursive-mode.dev/introduction

reply

upvote

by elAhmo19 hours ago|

[-]

Don't you have that by just resuming old convo?

The only issue is that it didn't hit the cache so it was expensive if you resume later.

reply

upvote

by eknkc19 hours ago|

[-]

Not at the moment apparently. They remove the thinking messages when you continue after 1 hour. That was the whole idea of that change. So the LLM gets all your messages, its responses etc but not the thinking parts, why it generated that responses. You get a lobotomised session.

reply

upvote

by elAhmo18 hours ago|

[-]

OK didn't know that. I also resume fairly old sessions with 100-200k of context, and I sometimes keep them active for a while (but with large breaks in between).

Still on Opus 4.6 with no adaptive thinking, so didn't really notice anything worse in the past weeks, but who knows.

reply

upvote

by tbrockman19 hours ago|

[-]

Or generate tiny filler messages every hour until you come back to it.

reply