upvote
Cache ttl on max subscriptions is 1h, FYI.
reply
Only if you set `ENABLE_PROMPT_CACHING_1H`, which was mentioned in the release notes for a recent Claude Code release but doesn't seem to be in the official docs.
reply
subusers supposedly get it automatic again after the fix (and now also with `DISABLE_TELEMETRY=1`)

but if you are api user you must set `ENABLE_PROMPT_CACHING_1H` as i understood

and when using your own api (via `ANTHROPIC_BASE_URL`) ensure `CLAUDE_CODE_ATTRIBUTION_HEADER=0` is set as well... https://github.com/anthropics/claude-code/issues/50085

and check out the other neckbreakers ive found pukes lots of malicious compliance by feels... :/

[BUG] new sessions will *never* hit a (full)cache #47098 https://github.com/anthropics/claude-code/issues/47098

[BUG] /clear bleeds into the next session (what also breaks cache) #47756 https://github.com/anthropics/claude-code/issues/47756

[BUG] uncachable system prompt caused by includeGitInstructions / CLAUDE_CODE_DISABLE_GIT_INSTRUCTIONS -> git status https://github.com/anthropics/claude-code/issues/47107

reply
Bruh. It's getting hard to track down all these MAKE_IT_ACTUALLY_WORK settings that default to off for no reason.
reply
That'd be awesome but it doesn't reflect what I see. Do you have a source for that? What I see is if take a quick break the session loses ~5% right at the start of the next prompt processing. (I'm currently on max 5x)
reply
Not at my workstation right now, but simply ask claude to analyze jsonl transcript of any session, there are two cache keys there, one is 5m, another 1h. Only 1h gets set. There are also some entries there that will tell you if request was a cache hit or miss, or if cache rewrite happened. I've had claude test another claude and on max 5x subscription, cache miss only happened if message was sent after 1h, or if session was resumed using /resume or --resume (this is a bug that exists since January - all session resumes will cause a full cache rewrite).

However, cache being hit doesn't necessarily mean Anthropic won't just subtract usage from you as if it wasn't hit. It's Anthropic we're talking about. They can do whatever they want with your usage and then blame you for it.

reply
I have heard that if you have telemetry disabled the cache is 5 minutes, otherwise 1h. No clue how true that is however my experience (with telemetry enabled) has been the 1h cache.
reply
They've acknowledged that as a bug and have fixed it.
reply
It's true as far as I can tell, just by my own checking using `/status`. You can also tell by when the "clear" reminder hint shows up. Also if you look at the leaked claude code you can see that almost everything in the main thread is cached with 1H TTL (I believe subagents use 5 minute TTL)
reply
>pay for reinitializing the cache

Why can't they save the kv cache to disk then later reload it to memory?

reply
It’s a shitload of data, and it only works if all the tokens are 100% identical, i.e. all the attention values are exactly the same.

Typically it’s cached for about 5 minutes, you can pay extra for longer caches.

reply
Probably because the costly operation is loading it onto the GPU, doesn't matter if it's from disk or from your request.
reply
The point of prompt caching is to save on prefill which for large contexts (common for agentic workloads) is quite expensive per token. So there is a context length where storing that KV-cache is worth it, because loading it back in is more efficient than recomputing it. For larger SOTA models, the KV cache unit size is also much smaller compared to the compute cost of prefill, so caching becomes worthwhile even for smaller context.
reply
Yeah the caching change is probably 90% of “i run out of usage so fast now!” Issues.
reply
Ah I can see how my phrasing might be misleading, but these prompts were made within 5 minutes of each other, the timing I mentioned were what Claude spent working.
reply
is it 5 mins between constant prompting/work or 5 mins as in if i step away from the comp for 5 mins and comp back and prompt again im not subject to reinit?

if it's the latter that's crazy. i dont even know what to do there, compactions already feel like a memory wipe

reply