undefined

points

by 3uler4 hours ago |

comments

by dathery2 hours ago|

[-]

The OpenCode devs talk about this on Twitter a lot, e.g. https://xcancel.com/thdxr/status/2048268697790300343

> tool call pruning breaks cache and people will tell you this is horrible and expensive

> except i looked at some anthropic data and real user behavior ends up with better cache hits and 30% less spend

> even this is needs to be analyzed further, it's just not simple

> for openai data it's inverted! cache hit ratio is actually better [sic: I think he meant worse based on the screenshot] with tool call pruning turned on

> but the net $ saved is only 5%

> kimi is a funny one - it has better cache hits with pruning on...but is also more expensive!

There was also another thread recently where he discussed that pruning improves user experience (models are smarter with less context) but I can't find it.

This can also be disabled in the config: https://opencode.ai/docs/config/#compaction

by soerxpso1 minutes ago|

parent|

[-]

My understanding of caching with most models/providers is that a prefix substring of the context has to be reused for a cache hit, but not necessarily the whole entire context window. So if you prune tool calls from the history, you're going to get one cache miss on the newly-pruned history, and then you're going to be getting cache hits on every subsequent turn, with a lower number of input tokens. If you prune subsequent tool calls after that, you would still get a cache hit for the already-pruned portion of the context, just not the full context.

by hirako200028 minutes ago|

parent|

prev|

[-]

They are. Empirical evidence on my side. Because attention is sparse across the context. It's not truly treating a million token the way it treats a fraction of that count. For performance.

by huqedato3 hours ago|

prev|

[-]

I can't confirm this. Having utilized Opencode for a large project over the past 10 months, with multiple models and agents, we've never run into such 'cache stability issues'."

by embedding-shape4 hours ago|

prev|

[-]

That'd be really easy to spot and also fix, most likely. Any open issue you could point us to, must surely been reported already?

by nolok3 hours ago|

parent|

[-]

> That'd be really easy to spot and also fix, most likely

Ah, reminds me of good old "There are only 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors."

by criemen3 hours ago|

parent|

[-]

> Ah, reminds me of good old "There are only 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors."

You quip, but LLM KV caching (from the harness side) is quite easy: You get a cache hit on stable prompt prefixes, period. That means you want to keep the prefix stable, and only append at the end of the conversation. Made up example: Don't put the git branch name into the system prompt part (that comes first), as whenever the branch name changes, that'd trigger a cache invalidation of the entire prompt.

Getting this right requires some care to not by accident modify the prefix, basically, and some design on communicating the things that can change (user configuration, working dir, git information, ...).

by franknord231 hours ago|

parent|

[-]

That sounds like the experience of writing Containerfiles; since steps are cached you want to pull the thing you are iterating on as far down as possible.

by gopher_space12 minutes ago|

parent|

[-]

All of this work has been done before in different contexts. Memory management with bigger blocks and weaker definitions that change whenever some grad student gets a bright idea.

by xcjsam3 hours ago|

parent|

prev|

[-]

[dead]

by krzyk3 hours ago|

parent|

prev|

[-]

Opencode (and other coding agents) have hundreds of open issues reported. It is quite discouraging when they are not being closed/fixed.

by jsjsjsuduiwkw2 hours ago|

parent|