undefined

points

by embedding-shape5 hours ago |

comments

by 3uler4 hours ago|

[-]

Opencode has really bad cache stability issues that they seem uninterested in fixing at the moment.

by dathery2 hours ago|

parent|

[-]

The OpenCode devs talk about this on Twitter a lot, e.g. https://xcancel.com/thdxr/status/2048268697790300343

> tool call pruning breaks cache and people will tell you this is horrible and expensive

> except i looked at some anthropic data and real user behavior ends up with better cache hits and 30% less spend

> even this is needs to be analyzed further, it's just not simple

> for openai data it's inverted! cache hit ratio is actually better [sic: I think he meant worse based on the screenshot] with tool call pruning turned on

> but the net $ saved is only 5%

> kimi is a funny one - it has better cache hits with pruning on...but is also more expensive!

There was also another thread recently where he discussed that pruning improves user experience (models are smarter with less context) but I can't find it.

This can also be disabled in the config: https://opencode.ai/docs/config/#compaction

by soerxpso2 minutes ago|

parent|

[-]

My understanding of caching with most models/providers is that a prefix substring of the context has to be reused for a cache hit, but not necessarily the whole entire context window. So if you prune tool calls from the history, you're going to get one cache miss on the newly-pruned history, and then you're going to be getting cache hits on every subsequent turn, with a lower number of input tokens. If you prune subsequent tool calls after that, you would still get a cache hit for the already-pruned portion of the context, just not the full context.

by hirako200029 minutes ago|

parent|

prev|

[-]

They are. Empirical evidence on my side. Because attention is sparse across the context. It's not truly treating a million token the way it treats a fraction of that count. For performance.

by huqedato3 hours ago|

parent|

prev|

[-]

I can't confirm this. Having utilized Opencode for a large project over the past 10 months, with multiple models and agents, we've never run into such 'cache stability issues'."

by embedding-shape4 hours ago|

parent|

prev|

[-]

That'd be really easy to spot and also fix, most likely. Any open issue you could point us to, must surely been reported already?

by nolok3 hours ago|

parent|

[-]

> That'd be really easy to spot and also fix, most likely

Ah, reminds me of good old "There are only 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors."

by criemen3 hours ago|

parent|

[-]

> Ah, reminds me of good old "There are only 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors."

You quip, but LLM KV caching (from the harness side) is quite easy: You get a cache hit on stable prompt prefixes, period. That means you want to keep the prefix stable, and only append at the end of the conversation. Made up example: Don't put the git branch name into the system prompt part (that comes first), as whenever the branch name changes, that'd trigger a cache invalidation of the entire prompt.

Getting this right requires some care to not by accident modify the prefix, basically, and some design on communicating the things that can change (user configuration, working dir, git information, ...).

by franknord231 hours ago|

parent|

[-]

That sounds like the experience of writing Containerfiles; since steps are cached you want to pull the thing you are iterating on as far down as possible.

by gopher_space13 minutes ago|

parent|

[-]

All of this work has been done before in different contexts. Memory management with bigger blocks and weaker definitions that change whenever some grad student gets a bright idea.

by xcjsam3 hours ago|

parent|

prev|

[-]

[dead]

by krzyk3 hours ago|

parent|

prev|

[-]

Opencode (and other coding agents) have hundreds of open issues reported. It is quite discouraging when they are not being closed/fixed.

by jsjsjsuduiwkw2 hours ago|

parent|

[-]

[dead]

by metalspot58 minutes ago|

parent|

prev|

[-]

I am getting 98.6% cache hit ratio on deepseek-v4-flash with opencode

by bobkb22 minutes ago|

parent|

[-]

That’s impressive!

On the sheer performance it’s comparable to Opus ?

by Bombthecat3 hours ago|

parent|

prev|

[-]

[flagged]

by kiproping2 hours ago|

prev|

[-]

This would be a better page to link to https://github.com/esengine/DeepSeek-Reasonix/blob/main/docs...

They explain some of the the reasons why they have a better solution and why they are very opinionated

>Automatic prefix caching activates only when the exact byte prefix of the previous request matches. Most agent loops reorder, rewrite, or inject fresh timestamps each turn — cache hit rate in practice: <20%.

So they optimize on this plus other techniques to improve cache hits, making it cheaper.

by tontinton1 hours ago|

prev|

[-]

Yep exactly my thoughts, went and looked at the code for the deepseek provider in my coding agent. and basically all of what the author wrote there is implemented... http://github.com/tontinton/maki for the curios

by bwfan1235 hours ago|

prev|

[-]

> I wrote a tiny little bridge so I could use DeepSeek V4 Pro via Codex

Can you share the bridge. DeepSeek v4 is awesome paired with claude-code or opencode. I found that claude code costs me less than opencode and I am presuming this is due to a better engineered harness.

by embedding-shape4 hours ago|

parent|

[-]

Sure, keep in mind it's a steaming pile of hacked together hacks, probably won't work in every case, doesn't support every feature that should be supported (like parallel tool calling, both Codex + DeepSeek API support it), and it might make your computer catch on fire: https://gist.github.com/embedding-shapes/eab3e63e5a95d3d78a2...

I only used it for a few hours to play around with stuff before the quota issue was fixed and I could resume using GPT models, and the bridge was coded by DeepSeek-V4-Flash-IQ2XXS + DwarfStar4 locally, I take no responsibility for what might happen with your computer or you, during usage or just reading the code.

Edit: heh, like don't look at line 117 for example where seemingly it likes to handle misspellings in the .env file which totally wasn't my fault for typo'ing the API key in that file... I'm sure there are tons of sharp edges and dumb stuff in there.

by bayesianbot2 hours ago|

parent|

prev|

[-]

LiteLLM can serve OpenAI API endpoint IIRC and proxy that to other providers like DeepSeek, should work with Codex

by Den_VR4 hours ago|

parent|

prev|

[-]

I’m feeling more a novice every day, but how isn’t this just handing over your code to team deepseek for whatever they might want

by embedding-shape4 hours ago|

parent|

[-]

Not everyone is working with state secrets or user personal data (or even more closely guarded, company secrets) on a daily basis, most of what I hack on is either FOSS already, or will be, not much to keep secret here.

Obviously, if you do deal with any sort of secrets, then using local LLMs over OpenAI, Anthropic, DeepSeek or whoever is obviously preferred, and in the case of personal data of users, probably a requirement.

by jack_pp2 hours ago|

parent|

[-]

either this or you work on software that even if copied won't get you far since the business relies on network effects or pure networking.

Getting the source code of facebook or instagram doesn't mean you could compete with them.

I work for a company that has built relationship with event organizers over the past 10 years. The code I maintain could be written from scratch in maybe 2-3 months even though it was built over the past 10 years but besides that you have frontend / DB / hardware / logistics etc

by Demiurge45 minutes ago|

parent|

[-]

I actually agree with you, for the most part. The code I work with actually does contain some valuable algorithms, but Im pretty sure the effort of integrating them into a larger system is pointless without the data. It’s almost like stealing half-life 2 source code without any assets.

Still, “Getting the source code of facebook or instagram doesn't mean you could compete with them.” I think to giants like that, having access to their source code could open up some very interesting loop holes for manipulating the ranking algorithms, or even security vulnerabilities.

by jack_pp40 minutes ago|

parent|

[-]

True, haven't thought of that. However very few actual projects / companies are in a situation where the chinese GOVT would be interested to spend resources to hack your platform. For the ones that are afraid of that there's always self hosting of course

by oldmanhorton3 hours ago|

parent|

prev|

[-]

You’re not a novice, there are a lot of us who know exactly what we are doing and see this as a huge downside. We are just being told to go faster, faster, faster lest we miss out on… something?

by jijji3 hours ago|

parent|

prev|

[-]

there's laws on the books in China that says that every company operating in China must aid and abet the Chinese government in espionage against the rest of the world. given those facts, I find it deeply troubling to be using anything coming out of China, especially a program that runs in the context of a Linux terminal on a machine that might have something important on it. I'd argue it's a back door waiting to happen, if not sooner than obviously later.

by tim-projects33 minutes ago|

parent|

[-]

Is it not better to have a country far away spying on you than your own country?

by goobatrooba56 minutes ago|

parent|

prev|

[-]

As a European I have to admit I am these days more worried about the US than China. See yesterday's article about the US government forcing Microsoft to give them lists of Dutch government officials. Utter madness. At least the Chinese mainly care about the money and power levers, the US about strange worlds of revenge and manipulation, trying to change or influence your government. E.g. which of the two countries has put crippling personal sanctions on staff of the international criminal court?

Honestly I'd love to love the US again, but basically after Obama things have just gone down and down and no soul will trust the US again in the next generation or two.

by _3u101 hours ago|

parent|

prev|

[-]

FISA section 702 / Five eyes / Room 641A.

by himata41135 hours ago|

prev|

[-]

this appears to be native to the terminal, as in, there's no special application that runs or wraps an agent inside a tui. So basically instead of commands you type plain english?

by embedding-shape5 hours ago|

parent|

[-]

> this appears to be native to the terminal, as in, there's no special application that runs or wraps an agent inside a tui

Same with codex? codex-rs at least, is a TUI as well, it does run a "app-server" in the background, that the TUI actually interacts with, but that's just an implementation detail. Also makes it easy to hook in your own programs to fire of codex "headless" sessions even without the TUI.