DeepSeek reasonix, DeepSeek native coding agent with high caching and low cost

upvote

DeepSeek reasonix, DeepSeek native coding agent with high caching and low cost

(esengine.github.io)

229 points

by Alifatisk6 hours ago |

upvote

by embedding-shape4 hours ago|

[-]

I'm not sure you need a "DeepSeek native coding agent" to take advantage of DeepSeeks cache, yesterday as the Codex quota usage issue still wasn't solved for me, I wrote a tiny little bridge so I could use DeepSeek V4 Pro via Codex, and seems most of everything I did was basically cached as far as I can tell: https://i.imgur.com/7eKn6wN.png (2026-05-23 Input (Cache hit): 39,123,200 tokens, Input (Cache miss) 1,692,286), and the bridge is doing not special, just massage the DeepSeek API shape into what Codex expects, nothing particular about caching at all.

Besides being even better at the caching, I'm not sure what benefits you'd get compared to just firing up OpenCode with the DeepSeek API yourself, it'll similarly do caching for sure and also "talks directly to api.deepseek.com" if that matters, and you'll get a much more mature harness.

reply

upvote

by tontinton55 minutes ago|

[-]

Yep exactly my thoughts, went and looked at the code for the deepseek provider in my coding agent. and basically all of what the author wrote there is implemented... http://github.com/tontinton/maki for the curios

reply

upvote

by 3uler3 hours ago|

[-]

Opencode has really bad cache stability issues that they seem uninterested in fixing at the moment.

reply

upvote

by dathery2 hours ago|

[-]

The OpenCode devs talk about this on Twitter a lot, e.g. https://xcancel.com/thdxr/status/2048268697790300343

> tool call pruning breaks cache and people will tell you this is horrible and expensive

> except i looked at some anthropic data and real user behavior ends up with better cache hits and 30% less spend

> even this is needs to be analyzed further, it's just not simple

> for openai data it's inverted! cache hit ratio is actually better [sic: I think he meant worse based on the screenshot] with tool call pruning turned on

> but the net $ saved is only 5%

> kimi is a funny one - it has better cache hits with pruning on...but is also more expensive!

There was also another thread recently where he discussed that pruning improves user experience (models are smarter with less context) but I can't find it.

This can also be disabled in the config: https://opencode.ai/docs/config/#compaction

reply

upvote

by huqedato2 hours ago|

[-]

I can't confirm this. Having utilized Opencode for a large project over the past 10 months, with multiple models and agents, we've never run into such 'cache stability issues'."

reply

upvote

by metalspot24 minutes ago|

[-]

I am getting 98.6% cache hit ratio on deepseek-v4-flash with opencode

reply

upvote

by embedding-shape3 hours ago|

[-]

That'd be really easy to spot and also fix, most likely. Any open issue you could point us to, must surely been reported already?

reply

upvote

by nolok2 hours ago|

[-]

> That'd be really easy to spot and also fix, most likely

Ah, reminds me of good old "There are only 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors."

reply

upvote

by criemen2 hours ago|

[-]

> Ah, reminds me of good old "There are only 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors."

You quip, but LLM KV caching (from the harness side) is quite easy: You get a cache hit on stable prompt prefixes, period. That means you want to keep the prefix stable, and only append at the end of the conversation. Made up example: Don't put the git branch name into the system prompt part (that comes first), as whenever the branch name changes, that'd trigger a cache invalidation of the entire prompt.

Getting this right requires some care to not by accident modify the prefix, basically, and some design on communicating the things that can change (user configuration, working dir, git information, ...).

reply

upvote

by franknord231 hours ago|

[-]

That sounds like the experience of writing Containerfiles; since steps are cached you want to pull the thing you are iterating on as far down as possible.

reply

upvote

by xcjsam2 hours ago|

[-]

[dead]

reply

upvote

by krzyk2 hours ago|

[-]

Opencode (and other coding agents) have hundreds of open issues reported. It is quite discouraging when they are not being closed/fixed.

reply

upvote

by jsjsjsuduiwkw2 hours ago|

[-]

[dead]

reply

upvote

by Bombthecat3 hours ago|

[-]

[flagged]

reply

upvote

by kiproping1 hours ago|

[-]

This would be a better page to link to https://github.com/esengine/DeepSeek-Reasonix/blob/main/docs...

They explain some of the the reasons why they have a better solution and why they are very opinionated

>Automatic prefix caching activates only when the exact byte prefix of the previous request matches. Most agent loops reorder, rewrite, or inject fresh timestamps each turn — cache hit rate in practice: <20%.

So they optimize on this plus other techniques to improve cache hits, making it cheaper.

reply

upvote

by bwfan1234 hours ago|

[-]

> I wrote a tiny little bridge so I could use DeepSeek V4 Pro via Codex

Can you share the bridge. DeepSeek v4 is awesome paired with claude-code or opencode. I found that claude code costs me less than opencode and I am presuming this is due to a better engineered harness.

reply

upvote

by embedding-shape4 hours ago|

[-]

Sure, keep in mind it's a steaming pile of hacked together hacks, probably won't work in every case, doesn't support every feature that should be supported (like parallel tool calling, both Codex + DeepSeek API support it), and it might make your computer catch on fire: https://gist.github.com/embedding-shapes/eab3e63e5a95d3d78a2...

I only used it for a few hours to play around with stuff before the quota issue was fixed and I could resume using GPT models, and the bridge was coded by DeepSeek-V4-Flash-IQ2XXS + DwarfStar4 locally, I take no responsibility for what might happen with your computer or you, during usage or just reading the code.

Edit: heh, like don't look at line 117 for example where seemingly it likes to handle misspellings in the .env file which totally wasn't my fault for typo'ing the API key in that file... I'm sure there are tons of sharp edges and dumb stuff in there.

reply

upvote

by bayesianbot1 hours ago|

[-]

LiteLLM can serve OpenAI API endpoint IIRC and proxy that to other providers like DeepSeek, should work with Codex

reply

upvote

by Den_VR3 hours ago|

[-]

I’m feeling more a novice every day, but how isn’t this just handing over your code to team deepseek for whatever they might want

reply

upvote

by embedding-shape3 hours ago|

[-]

Not everyone is working with state secrets or user personal data (or even more closely guarded, company secrets) on a daily basis, most of what I hack on is either FOSS already, or will be, not much to keep secret here.

Obviously, if you do deal with any sort of secrets, then using local LLMs over OpenAI, Anthropic, DeepSeek or whoever is obviously preferred, and in the case of personal data of users, probably a requirement.

reply

upvote

by jack_pp2 hours ago|

[-]

either this or you work on software that even if copied won't get you far since the business relies on network effects or pure networking.

Getting the source code of facebook or instagram doesn't mean you could compete with them.

I work for a company that has built relationship with event organizers over the past 10 years. The code I maintain could be written from scratch in maybe 2-3 months even though it was built over the past 10 years but besides that you have frontend / DB / hardware / logistics etc

reply

upvote

by Demiurge11 minutes ago|

[-]

I actually agree with you, for the most part. The code I work with actually does contain some valuable algorithms, but Im pretty sure the effort of integrating them into a larger system is pointless without the data. It’s almost like stealing half-life 2 source code without any assets.

Still, “Getting the source code of facebook or instagram doesn't mean you could compete with them.” I think to giants like that, having access to their source code could open up some very interesting loop holes for manipulating the ranking algorithms, or even security vulnerabilities.

reply

upvote

by jack_pp6 minutes ago|

[-]

True, haven't thought of that. However very few actual projects / companies are in a situation where the chinese GOVT would be interested to spend resources to hack your platform. For the ones that are afraid of that there's always self hosting of course

reply

upvote

by oldmanhorton3 hours ago|

[-]

You’re not a novice, there are a lot of us who know exactly what we are doing and see this as a huge downside. We are just being told to go faster, faster, faster lest we miss out on… something?

reply

upvote

by jijji2 hours ago|

[-]

there's laws on the books in China that says that every company operating in China must aid and abet the Chinese government in espionage against the rest of the world. given those facts, I find it deeply troubling to be using anything coming out of China, especially a program that runs in the context of a Linux terminal on a machine that might have something important on it. I'd argue it's a back door waiting to happen, if not sooner than obviously later.

reply

upvote

by goobatrooba22 minutes ago|

[-]

As a European I have to admit I am these days more worried about the US than China. See yesterday's article about the US government forcing Microsoft to give them lists of Dutch government officials. Utter madness. At least the Chinese mainly care about the money and power levers, the US about strange worlds of revenge and manipulation, trying to change or influence your government. E.g. which of the two countries has put crippling personal sanctions on staff of the international criminal court?

Honestly I'd love to love the US again, but basically after Obama things have just gone down and down and no soul will trust the US again in the next generation or two.

reply

upvote

by _3u1038 minutes ago|

[-]

FISA section 702 / Five eyes / Room 641A.

reply

upvote

by himata41134 hours ago|

[-]

this appears to be native to the terminal, as in, there's no special application that runs or wraps an agent inside a tui. So basically instead of commands you type plain english?

reply

upvote

by embedding-shape4 hours ago|

[-]

> this appears to be native to the terminal, as in, there's no special application that runs or wraps an agent inside a tui

Same with codex? codex-rs at least, is a TUI as well, it does run a "app-server" in the background, that the TUI actually interacts with, but that's just an implementation detail. Also makes it easy to hook in your own programs to fire of codex "headless" sessions even without the TUI.

reply

upvote

by agrippanux42 minutes ago|

[-]

This website seems to have been generated by Codex - I asked Codex to create an HTML overview of a feature for my team and it made an overly produced monstrosity - complete with the same large stat boxes that were for the most part devoid of meaningful information - using the same font, colors, layout, hero section, etc. It was also terrible on mobile just like this is.

In the end I had Claude produce a one-page html file that was 95% of the way there and it took minor editing to clearly explain the intent of the feature.

reply

upvote

by carterschonwald6 minutes ago|

[-]

i cant find anything substantiated in the code that actually differentiates it from any other harness.

my fork of oh my pi that i have a lot of experiments in, is lterally designed to only work well with models that have decent reasoning levels, like deep seek models. check it out!

https://github.com/cartazio/oh-punkin-pi/blob/main/scripts/b... — thats the install script for after clone

fair warning: tis my dog food test bed as i build even fancier stuff

reply

upvote

by skeledrew4 hours ago|

[-]

Not a fan of that page. The animated typing and resulting continuous resize of the example keeps moving the content beneath it down and up. Such bad UX.

reply

upvote

by embedding-shape4 hours ago|

[-]

Agents or no agents, people still need to test their websites on different resolutions or at least window width, but seems this is becoming a lost art.

reply

upvote

by mirekrusin3 hours ago|

[-]

Yeah, doesn’t look designed for people who want to read it beyond animated typing animation.

reply

upvote

by m4rkuskk2 hours ago|

[-]

Claude design AI slob.

reply

upvote

by wg044 minutes ago|

[-]

Performance is horrible when you type but caching is magical.

Extremely pro consumer tool. I have been hammering it hard with 97% cache utilization and barely $0.03 dollar spent for me constantly exploring a codebase.

reply

upvote

by declan_roberts4 hours ago|

[-]

I love the focus on cache hit efficiency. Hats off to the deekseek team for creating a great product that maximizes cost efficiency for the user.

reply

upvote

by bwfan1234 hours ago|

[-]

> Hats off to the deekseek team for creating a great product

I have been using it for a while, and I wholeheartedly agree. imo, it is as good as codex or claude which I also use. It is a winner in the cost-sensitive tier, and if some startup could put it together with data-retention in mind, it could be a great product sold to the enterprise, as data-retention and privacy are the main issues for the coding-assistant usecase.

reply

upvote

by chillfox3 hours ago|

[-]

Deepseek v4 pro is definitely my preferred cheap model, it's very good, and I use it all the time for my personal projects (opencode go plan), but I also use Claude Opus all the time at work and Deepseek is not as good as that, but it does compete with Sonnet for capability, and beats it on price.

reply

upvote

by nicce2 hours ago|

[-]

Just in case, note that this project is someone's side project

> Independent open-source project · not affiliated with DeepSeek

reply

upvote

by Bombthecat3 hours ago|

[-]

Adding already cheap API cost and you probably could let it run for days and the same task..

reply

upvote

by stavros4 hours ago|

[-]

How can you have cache hit efficiency? Isn't it just a matter of not changing the previous context? I don't understand what knobs there are to tweak on this.

reply

upvote

by everforward3 hours ago|

[-]

> Isn't it just a matter of not changing the previous context?

Yes, but a lot of harnesses change previous context. E.g. the system prompt injects the current time/date, working directory, files in the working directory, etc. Compaction also changes the whole previous context. I _think_ changing the list of tools also invalidates cache, so invoking a subagent with different tools would invalidate the cache.

My vague impression is that it's in a similar vein to functional programming languages. It generally disallows doing things that lead to bugs (cache misses in this case), and presumably allows you to do those things in a way that makes it much clearer that this is likely to cause cache misses. I would guess that in this paradigm, you don't mutate your existing session, you derive a new session by mutating the prior context into a new context.

reply

upvote

by chillfox3 hours ago|

[-]

changing between plan/build mode in some agents will change the tools list, which breaks the cache.

reply

upvote

by brookst3 hours ago|

[-]

Cache is always there, it’s just that it only caches up to the point where an input token changes. So if the tools list is early in the prompt, changing it would limit cache for most of the prompt. If the tools list is the last thing, you could still get 99% cache hits even if it changes every turn.

reply

upvote

by RevEng1 hours ago|

[-]

After a couple of turns the system prompt is a small part of the context. Not changing the system prompt at all is key so that the rest of the history is itself part of the prefix.

reply

upvote

by storus1 hours ago|

[-]

Can it instruct DeepSeek during an LLM call to start removing old tool calls from the context instead of waiting for the LLM call to finish if the context size approaches DeepSeek's dumb zone? Claude Code can't do that, /compact can only happen after the LLM call; it's often preferable to start cleaning up context during an LLM call, especially when tool calls are huge like reading markdown files; implementation-wise all that is needed is to start removing earliest <tool call start> ... <tool call end> and replacing them just with some log entry stating this tool call was already performed, then re-running KV cache prefill (so the "online" compaction would get 0.5s latency hit every time it's performed). That way one can read 1000 files in one LLM call.

reply

upvote

by nextaccountic50 minutes ago|

[-]

> Tool-call repair

> Tool arguments the model produces occasionally have JSON typos, unclosed quotes, or shape mismatches. Reasonix runs a schema-aware repair pass before dispatch so malformed args still execute.

So Deepseek API doesn't have a structured output option where you give a grammar and the model promises the output will follow this grammar?

Or it does, but it's buggy?

reply

upvote

by unshavedyak3 hours ago|

[-]

It's pretty funny, i'm a $200/m Claude subscriber and i've had little need to use anything else. However the more Claude has been restricting my workflow (notably around the recent IDE/-p usage change) the more i've been wanting to go elsehwere.

I'm concerned since i really want SOTA reasoning, but DeepSeek still has me interested.

reply

upvote

by Alifatisk2 hours ago|

[-]

> I'm concerned since i really want SOTA reasoning

I think you should give other models a try and see how much they differ from SOTA models. I did this and realized, even Qwen-2.5-Max was enough. I am sure even Claude Sonnet 3.5 is enough for things I play around with. I am not really striving for fields medal in Mathematics.

reply

upvote

by unshavedyak20 minutes ago|

[-]

That's fair, neither am i - i do tend to work in large, complex, full of legacy decision based codebases. Eg i have access to Sonnet (of course), but i choose to solely work in Opus because i find its output reads better, analyzes better, etc.

The "cost" is dumb models is just so high for me. Eg every bad decision they make increases my frustration quite a bit. Despite putting a lot of effort into my workflow to help reduce the number of decisions they make, they always will. So my hedge is always against that.. trying to reduce how insane they can be heh.

reply

upvote

by 0xbadcafebee1 hours ago|

[-]

You should definitely stick to the $200 plan, and not try the $10 coding plans with open weight models and higher limits. Anthropic needs your money to stay solvent, and you'll sleep better knowing you're using SOTA.

reply

upvote

by gck144 minutes ago|

[-]

I gave a fairly complex reverse engineering task to DS-4 xhigh and GPT-5.5 xhigh today.

After about 6 hours, both ultimately failed to fully RE, however, there were some drastic differences:

DS stopped every 30 minutes or so, saying it did full RE and it should all work now, while in fact, it didn't complete even 1% of it. It also looked for shortcuts again and again, despite me prompting heavily that the specific shortcut may not be used. It was a complete and utter failure.

GPT-5.5, on the other hand, blew me away. It just did the right things, didn't jump to next steps until it was sure it completed the initial layers and had a full understanding of what's required. The only time I prompted it during the 6 hours was when I saw it going in the right direction and I could nudge it slightly towards an even better way. I never felt I was fighting it. Okay, maybe a little bit - after compaction, it sometimes would go on a "no I'm not helping you with reverse engineering" tangent, but it would resolve in a clean session.

I cancelled my Claude subscription a month ago, so I haven't tested that, but DeepSeek has reminded me a lot of how I worked with Opus 4.6/4.7. Which perhaps could be a positive sign to some, but GPT-5.5 showed me that the way claude/ds work is just way too annoying.

reply

upvote

by ttul18 minutes ago|

[-]

What you’re experiencing is the difference in model intelligence. Most models can seem pretty good at simple stuff over short time horizons. Complex work requires that more intelligence be stuffed into those trillion-dimensional spaces.

reply

upvote

by KronisLV1 hours ago|

[-]

> i've been wanting to go elsehwere.

There's always the option of using Anthropic's models for some tasks like planning and then just hand over the implementation task to something like DeepSeek. Across different tools, a Markdown plan works pretty okay. That's what I'm planning to do if I go from the 5x Max subscription down to the Pro.

I am also writing a launcher that makes using 3rd party providers with Claude Code easy (https://ccode.kronis.dev) and I already have a local proxy up and running, just not dynamic model switching yet. Though it shouldn't be too hard to add, will probably be there within a week or two, depending on my schedule.

I don't think it's wise to leave Anthropic altogether because their models are great (and a subscription gives you features like Remote Control which I like), but switching tiers and maybe saving a bit of money seems viable! On the other hand, you do need a quality baseline, because I remember using Cerebras with GLM 4.6 way back and there was a bit too much slop.

reply

upvote

by logicchains2 hours ago|

[-]

If you want SOTA reasoning you should be using GPT 5.5 Pro.

reply

upvote

by unshavedyak45 minutes ago|

[-]

This is fair, but i've found the different models to have different moods and require different interactions to get them to stick to just the specific edits i ask for, etc.

I used to surf the three big players frequently and got really tired of the effort needed to steer some models. In the end i ended up sticking with Claude because it required less steering effort. While not strictly reasoning, a models ability to follow clear directions consistently is something i'd consider part of its SOTA capabilities.

Eventually i just tired of exploring. I just want stability.

Which ironically is why i'm thinking about moving from Claude. The very basic IDE/-p usage getting removed from my plan is a UX stability issue. I'm trying to progressively improve my workflows and efficiency, not have to establish a new foundation anytime something shifts. Quite frustrating.

reply

upvote

by auggierose1 hours ago|

[-]

Codex has only GPT 5.5

reply

upvote

by schaefer3 hours ago|

[-]

Okay, I'm curious.

From the FAQ, I see:

>Can I point it at a self-hosted / private DeepSeek endpoint?

>Yes. Since 0.30 we accept non-standard key prefixes for self-hosted DeepSeek endpoints. Just point `baseUrl` at your internal address — the loop, cache strategy, and tool protocol are unchanged.

But my question is: If I use Reasonix to talk to a deepseek endpoint through openrouter, am I still getting the cache-hit benifits of this agent harness?

reply

upvote

by csunoser3 hours ago|

[-]

Yes*. At least from my limited usage of deepseek-flash for a few billion tokens on openrouter, the cache-hit rate is >95%. And I simply used the claude code harness pointed at the openrouter anthropic compatible endpoint with no fluff.

reply

upvote

by schaefer3 hours ago|

[-]

thank you!

reply

upvote

by danborn262 hours ago|

[-]

High caching rates for coding agents can drastically reduce latency and API costs. I am curious to see how the caching strategy handles context invalidation across multiple files.

reply

upvote

by xcjsam2 hours ago|

[-]

[flagged]

reply

upvote

by imagetic2 hours ago|

[-]

https://shittycodingagent.ai

reply

upvote

by mi_lk1 hours ago|

[-]

Not sure about the story but it would be funny if pi folks actually own this domain.

reply

upvote

by chuckadams1 hours ago|

[-]

They do. That's Pi's old name.

reply

upvote

by chabes2 hours ago|

[-]

Aka pi.dev

reply

upvote

by mmaunder3 hours ago|

[-]

Unusable thanks to the top animation pushing the rest of the site down repeatedly as you’re trying to read.

reply

upvote

by singiamtel3 hours ago|

[-]

I would've liked benchmarks against other harnesses showing the caching performance

reply

upvote

by Alifatisk2 hours ago|

[-]

Is there benchmarks and measurements that offers comparisons between different harnesses?

reply

upvote

by mmarcant43 minutes ago|

[-]

"byte-stable prefix cache" -- give us your codebase in a way that's even EASIER for us to train on.

reply

upvote

by singingtoday56 minutes ago|

[-]

That site does not render correctly on my android. Lots of text on the right breaking the reactive layout.

reply

upvote

by hebetude3 hours ago|

[-]

Wow the UI looks exactly what I vibe coded yesterday. What a coincidence

reply

upvote

by huqedato2 hours ago|

[-]

It's obvious why...

reply

upvote

by hirako20004 hours ago|

[-]

Good timing given the cost spike across other frontier models.

reply

upvote

by notjes4 hours ago|

[-]

Good thing DS just made their discount permanent. https://x.com/deepseek_ai/status/2057854261699195173

reply

upvote

by m1011 hours ago|

[-]

For those of you that use deepseek v4 occasionally, what harness do you use it with? I’m only familiar with claude code and codex.

Any comments on what you can or cannot rely on it for relative to cc and codex would be appreciated too!

reply

upvote

by eikenberry5 minutes ago|

[-]

Maybe check out Goose. It is the standard agent harness being developed by The Linux Foundation under the AAIF. Under active development and the implementation seems to have a good leg up on the other popular agents.

https://github.com/aaif-goose/goose

https://goose-docs.ai/

reply

upvote

by droidjj1 hours ago|

[-]

Check out pi.dev. OpenCode is a nice batteries-included Claude Code replacement, but I’m in love with the extensibility of Pi.

reply

upvote

by chuckadams55 minutes ago|

[-]

Any Pi extensions you'd specifically recommend? I'm just starting out with Pi, but I've had mixed results with extensions. I'm using Pi with gemma4 26b locally, so anything that's friendly to small local models would be appreciated. I think the only extension I'm using right now is pi-total-recall.

reply

upvote

by gck13 minutes ago|

[-]

I think pi wants you to write your own extensions, adapted to your meeds.

I haven't had a need for any extensions though. Maybe subagents, but I solved that with tmux. For all the rest, I just use "skills".

reply

upvote

by theanonymousone4 hours ago|

[-]

Isn't caching a server-side thing? How does the agent affect it, significantly at least?

reply

upvote

by embedding-shape4 hours ago|

[-]

Say you put the current time down to the second in the system prompt, which is the message that goes in front of the entire conversation, then basically nothing will be cached, every agent turn needs to ingest the entire session over and over. Contrast to not doing that, and the backend can leverage caching all the way up to the latest message, as nothing until then changed.

reply

upvote

by esperent3 hours ago|

[-]

Surely other agent CLIs are not dumb enough to invalidate cache on every turn over something so obvious?

reply

upvote

by chillfox3 hours ago|

[-]

I don't think any the agents breaks caching on every turn, but they might do things like current list of files, or available tools depending upon plan/build mode... or lots of other things that breaks caching multiple times during a session.

reply

upvote

by brookst3 hours ago|

[-]

Probably not that exactly, but there is a tradeoff between effectiveness of the prompt and cache hit rate. If putting the user’s datetime in the middle of the prompt scores higher on evals but worsens cache hits, versus at the end of the prompt where it’s cache friendly but may not be as effective, what do you do?

This is still art as much as science and the different harnesses take different approaches.

reply

upvote

by embedding-shape3 hours ago|

[-]

Obviously not, most agents properly keep previous messages unchanged, at least the major ones I've been digging into the source off. Also, everything would get so much slower, that even developers creating their own agents would notice quickly how much slower theirs is, if they fuck this up.

reply

upvote

by theanonymousone2 hours ago|

[-]

Yes, of course you can destroy it. But how far can you "improve", beyond decent "common sense" behaviour.

reply

upvote

by yalogin2 hours ago|

[-]

Can someone give me a eli5 version of what this is? It really sounds useful to Claude subscribers.

Is this improving the cache hit and hence overall efficiency of coding workflows?

Does it also let me host a local llm (deepseek)? What are model min requirements for this?

reply

upvote

by timcobb2 hours ago|

[-]

You can also ask Claude and get an immediate answer, the power is yours

reply

upvote

by Salgat35 minutes ago|

[-]

Certainly you realize that these comments exist for more than a single person right? You expect potentially hundreds of viewers to each burn through AI tokens instead of just getting a direct and relevant answer here? This has the same vibe as the old forum posts where the only response was a "google it".

reply

upvote

by fouric2 hours ago|

[-]

I don't think it's particularly effective to create a new coding agent when there's existing open-source agents (especially extremely extensible ones like Pi) that already optimize for cache hits, have far larger communities, and work for providers other than Deepseek.

I specifically use multiple different models and providers, so this wouldn't be useful for me.

And it contributes to the problem of each person vibe-coding their own, incompatible, half-baked tool in a space, instead of contributing to a small set of tools and expanding them.

It'd be better to just extend an existing tool.

reply

upvote

by ricardobeat2 hours ago|

[-]

> The loop is append-only, engineered around DeepSeek's byte-stable prefix cache — long sessions hold 90%+ cache hit and input-token cost collapses to ~1/5. Terminal-first, leave it running.

AI marketing slop. This is how all models and coding harnesses work, isn't it?

The author claims (in another AI-written post):

> LangChain — along with every generic agent framework I checked — rebuilds the prompt every turn. Timestamps get injected. History gets reordered. Tool schemas re-serialize with different whitespace.

I haven't touched LangChain in a long, long time, but don't think any of the current harnesses, Claude Code, Pi, Crush, OpenCode etc do that except if you change configuration? Keeping the context stable for caching is a very basic principle and not a wild innovation.

This posing as DeepSeek-specific is also a mystery.

reply

upvote

by hmokiguess2 hours ago|

[-]

Click on the download page, it's hilarious. It has a lot of information about the "smart probe" on the download and it's a realtime probe you can rerun.

That's the pinnacle of AI slop over engineered garbage in my opinion. All of that information is noise.

reply

upvote

by pkulak2 hours ago|

[-]

Doesn't Pi Agent do exactly this? Assuming "append only" means they do some kind of compaction as well.

reply

upvote

by 4 hours ago|

[-]

deleted

reply

upvote

by quotemstr3 hours ago|

[-]

> no reordering, no marker-based compaction

Is this really the behavior you want? Yes, doing tool-result clearing and such will blow your cache, but if you do it only occasionally, it's still likely a win. Yes, cache hits are good, but not so good that it's okay to be profligate with context to preserve those precious, precious KVs.

reply

upvote

by Hfuffzehn2 hours ago|

[-]

This is really tickling the conspiracy theorist part of my brain.

"Independent open-source project · not affiliated with DeepSeek" "Reasonix only targets DeepSeek because..." "Why DeepSeek only? Can I swap to Claude / GPT? It's a design choice, not a limitation"

The lady doth protest too much, methinks?

Nicely timed shortly after the making the rebate permanent anouncement.

Could just be Chinese devs trying to help western devs with some software and a western facing marketing campaign to raise awareness. Could be DeepSeek astroturfing. Could be "someone" in China trying to get more access to western data.

Who knows?

reply

upvote

by andai2 hours ago|

[-]

But Claude made the website?

reply

upvote

by Alifatisk1 hours ago|

[-]

What conclusion are you drawing from that?

reply

upvote

by am17an2 hours ago|

[-]

This Claude front end skill is now soon to be slop.

reply

upvote

by auggierose1 hours ago|

[-]

Oh, I was wondering why all new websites look shitty in the same way.

reply

upvote

by ricardobeat2 hours ago|

[-]

Already is. Every new website looks exactly the same.

reply

upvote

by ankitwarbhe1 hours ago|

[-]

you created it yourself ?

reply

upvote

by Alifatisk1 hours ago|

[-]

No.

reply

upvote

by sergiotapia4 hours ago|

[-]

What AI model did you use for the website design? This is the second one I see with the exact same font and color scheme. Just curious because Claude models lean towards purples for example. Thank you!

reply

upvote

by pcwelder3 hours ago|

[-]

Opus 4.7 selects such palette and motifs by default. Might even be first iteration of claude design.

reply

upvote

by franga20004 hours ago|

[-]

This design still screams Claude to me, but a newer version than what you're thinking of. At some point they added a markdown file that tells it to use obviously AI designs like lots of blue/purple and gradients. Since then, this is its new style.

reply

upvote

by sheepscreek3 hours ago|

[-]

DeepSeek v4 perhaps?

reply

upvote

by FergusArgyll3 hours ago|

[-]

Frontend design skill by Anthropic specifically says not to use purple. I'd be surprised if it still uses purple. Have you seen that recently?

reply

upvote

by canadiantim4 hours ago|

[-]

So what's best low cost coding agent these days? Kimi 2.6? Qwen's latest closed model? Composer 2.5? DeepSeek?

reply

upvote

by throw1092053 minutes ago|

[-]

Cursor with Composer 2.5 seems to be competitive with frontier models (Opus and GPT-5.5) for a significant price discount. Benchmarks are gamed, as always, but $0.55/task vs $11.02 a task definitely indicates that there's some cost advantage.

https://cursor.com/evals

reply

upvote

by bwfan1234 hours ago|

[-]

In my experience, it is claude-code paired with deepseek-v4. For penny-pinchers like me, I can have long coding sessions with it with no anxiety about the cost. Also, prompting it to what you want and verifying the outputs is more important than the quality of the model. So, I am better off with a cheaper model and taking the responsibility for prompting it and verifying the results.

reply

upvote

by esperent3 hours ago|

[-]

It's obviously much cheaper paying by the token but how does it compare to a codex subscription on cost?

reply

upvote

by epolanski4 hours ago|

[-]

Can you quantify the actual costs in a week and the use you make?

reply

upvote

by wongarsu3 hours ago|

[-]

Not GP, but for my use I'd estimate $0.10-0.30 per hour of use per agent with DeepSeek v4 Pro

reply

upvote

by passive4 hours ago|

[-]

I've gone through ~600m tokens in Xiaomi Mimo though Claude, and it's been the most effective use of an agent I've had yet. It's very capable, but generally not ambitious, picking simple but effective solutions to most problems I give it. Going to write something longer about the experience when I get to a billion tokens.

reply

upvote

by Alifatisk3 hours ago|

[-]

I do have my eyes on the coding plan, which is quite generous.

https://mimo.mi.com

reply

upvote

by gandreani4 hours ago|

[-]

Are you using Mimo 2.5 pro?

reply

upvote

by passive3 hours ago|

[-]

Yes. I tried a couple of weeks with non-Pro, and it was pretty good, but I had too many spare tokens, so I switched back to Pro. :)

reply

upvote

by skeledrew4 hours ago|

[-]

Seems to be DeepSeek.

https://news.ycombinator.com/item?id=48237663

reply

upvote

by abalashov3 hours ago|

[-]

Although I have little interest in agentic coding, when I do use it, I have found Kimi K2.6 to give Opus-quality output, and have switched entirely to it for pretty much everything.

reply

upvote

by throw109202 hours ago|

[-]

I've used Opus extensively and tried K2.6 on a few projects, and the gap is huge. K2.6 is nowhere near the performance of Opus. That's fine because it's also far cheaper, but public benchmarks line up with my own personal experience that they aren't comparable in terms of intelligence.

(that is, different places on the Pareto efficiency graph)

reply

upvote

by ac294 hours ago|

[-]

Kimi 2.6 is great. Qwen3.7-max benchmarks similarly but I havent used it yet

reply

upvote

by stavros4 hours ago|

[-]

For me, it's by far Deepseek. It's many times cheaper than competitors, and about as good as Sonnet 4.6.

reply

upvote

by fouric2 hours ago|

[-]

I'd generally agree about Deepseek being as good as Sonnet - but I have extreme trouble with prompt compliance with V4 Pro in a way that I've never had with Sonnet. I'll tell it "find the bug, but don't fix it" or "please use this tool I just developed" and it'll ignore me a high fraction of the time.

It's bad enough that I'm working on guardrails at the harness level because prompting appears to be useless.

Do you have the same issue?

reply

upvote

by stavros2 hours ago|

[-]

I have Opus make a fairly detailed plan, then Deepseek implements, and GPT reviews. With that setup, I have zero issues, probably because what you mention is handled (the plan keeps it on track and the reviewer catches any issues).

Now that you mention it, though, I have seen it do a few things that weren't in the plan. The reviewer caught them, though, so they didn't cause a problem, and it's so cheap that overall it's a massive improvement.

reply

upvote

by lostmsu4 hours ago|

[-]

Just use codex with 5.5 on low reasoning levels

reply

upvote

by 3 hours ago|

[-]

deleted

reply

upvote

by WhereIsTheTruth55 minutes ago|

[-]

Y'all should not be writing js/ts/slop/npm based clis anymore

It's the agentic era, pick a better option

Just stop

reply

upvote

by Alifatisk29 minutes ago|

[-]

Whats that option?

reply

upvote

by aplomb10261 hours ago|

[-]

[flagged]

reply

upvote

by benjiro30001 hours ago|

[-]

[dead]

reply

upvote

by the_mitsuhiko4 hours ago|

[-]

[dead]

reply