#!/bin/sh
export ANTHROPIC_BASE_URL=https://api.deepseek.com/anthropic
export ANTHROPIC_AUTH_TOKEN=sk-secret
export ANTHROPIC_MODEL=deepseek-v4-flash
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1
exec claude $@This is what I’ve been using for non-confidential projects for about a week now (soon after v4 came out). I honestly can’t tell the difference, but I’m not doing anything crazy with it either.
Worth noting that I don’t think DeepSeek‘s API lets you opt out of training. Once this is up on other providers though… (OpenRouter is just proxying to DeepSeek atm)
Also, you can use HuggingFace Inference for DeepSeek V4 or Kimi K2.6, both of which work quite well and route through providers that you can enable/disable (like Together AI, DeepInfra, etc) - you'll have to check their policies but I think most of those commercial inference providers claim to not train on your data either.
Why is that?
IIRC, USA data protection protects data of US citizens only, foreigners data is not protected, and the companies are not even allowed to disclose when they collect those data.
HN is an American site. If you look at the US government, it is going to fearmonger about anything China related, because they haven't had a genuine competitor for decades and they're scared and lashing out. Most US news just parrot the government line, sometimes more so than state TV would, and so it reflects here.
I also feel comfortable saying that many Americans don't care one bit what happens to foreigners, be it by action of their government or companies.
Also, the author checked in their apparently effective social media advertising plan: https://github.com/aattaran/deepclaude/commit/a90a399682defc... (which seems to be working)
I have been wanting to try CC with different models since Opus went downhill last month..
What limitations or issues have you noticed when using DeepSeek with Claude Code if any?
is flash version on level of gpt 5.4 mini
PS. mentioning amp because i used to use it and I pay directly for token. I topped up 5 usd so I will be going to use it and see how far can it take me. But my impression so far is even when model subsidization is done, those open source models are quite viable alternatives.
My understanding is that DeepSeek V4 Pro is going to be uniquely good at working on consumer platforms with SSD offload, due to its extremely lean KV cache. Even if you only have a slow consumer platform, you should be able to just let it grind on a huge batch of tasks in parallel entirely unattended, and wake up later to a finished job.
AIUI, people are even experimenting with offloading the KV cache itself to storage, which may unlock this batching capability even beyond physical RAM limits as contexts grow. (This used to be considered a bad idea with bulky KV caches, due to concerns about wearout and performance, but the much leaner KV cache of DeepSeek V4 changes the picture quite radically.)
AI currently works basically by sending your entire codebase and workflow, and internal communication over the internet to some third party provider, and your only protection is some legal document say they pinky promise they won't train on your data.
And said promise is made by people whose entire business model relies on being able to slurp up all the licensed content on the internet and ignore said licensing, on the defense of being too big to fail.
> AIUI, people are even experimenting with offloading the KV cache itself to storage, which may unlock this batching capability even beyond physical RAM limits as contexts grow.
Especially this point. Any reason that this idea was considered bad? Is it due to the speed difference between the GPU VRAM to the RAM?
> Any reason that this idea was considered bad?
Because the KV cache was too big, even for a small context. This is still an issue with open models other than DeepSeek V4, though to a somewhat smaller extent than used to be the case. But the tiny KV of DeepSeek V4 is genuinely new.
The token cost makes it tempting to use for token-heavy tasks like this
Model inference was never subsidized. Inference is highly profitable with today's prices. That's why you have many inference providers. My guess, the prices for inference will go down, as more competition starts cutting the margin.
It's model training, development and R&D that cost a lot, and companies creating closed models don't have any business model except astroturfing and trying to recover training costs through overpriced inference.
https://api-docs.deepseek.com/quick_start/agent_integrations...
Also the author checked in their advertising plan: https://github.com/aattaran/deepclaude/commit/a90a399682defc...
Why would you keep using CC CLI if you want to use the much cheaper DeepSeek v4 models (Flash and Pro): isn't it the opportunity to kiss CC CLI goodbye and use something not controlled by Anthropic?
Anyone here successfully moved from CC CLI to a fully open-source project? I'm asking this as a Claude Code CLI (Sonnet/Opus) user. My "stack" is all open-source: from Linux to Emacs to what-have-you. I'd rather also have open-weight models and a fully open-source (not controlled by a single company) AI CLI.
Any suggestion for something that works well? (by "well" I mean "as well as Claude Code CLI", which is not a panacea so my bar ain't the end of the world either).
But I guess Anthropic is just not capable of implementing the OpenAI API compatible client in Claude Code.
This is a heavily subsidized price and will only last until the end of the month: "The deepseek-v4-pro model is currently offered at a 75% discount, extended until 2026/05/31 15:59 UTC." [0]
The "supported backends" table is also deceiving -- while OpenRouter's server's may be in the US, the only way to get the $0.44/$0.87 pricing is to pass through to the DeepSeek API, which of course is China-based. [1]
I do think the model is quite good, I myself use it through Ollama Cloud for simple tasks. But I think some folks have bought in a little too much to the marketing hype around it.
[0] https://api-docs.deepseek.com/quick_start/pricing [1] https://openrouter.ai/deepseek/deepseek-v4-pro/providers
From what I see while building my own agentic system in Elixir, the problem is in training for your specific harness/contracts. Claude/GPT-style models seem to be trained around very specific contracts used by the harness like tool call formats, planning structure, patching, reading files, recovering from errors, and knowing when to stop.
In practice, you either need a very strong general model that can infer and follow those contracts (expensive), or a weaker model that has been fine-tuned / trained specifically on your own agent contracts. Otherwise, the whole thing becomes flaky very quickly. And I suspect with Deepseek V4 you may get last options.
I finally get fed up and started using GPT 5.5 the past 4 days and its a breath a fresh air despite feeling much more minimal. With Claude I had to write so many hooks to enforce behaviors it wouldn't remember and it lacked common sense on. GPT 5.5 does a much better job with things like knowing the AWS CDK CLI can hang on long CloudFormation deployments and it should actively check the deployment status using CloudFormation API rather than hanging for 30+ minutes - and it does this all without asking.
Maybe there's better tooling built into Codex too, but at least on the surface level it seems like how smart the model is makes a significant difference because Claude has more tools than I can count and still struggles to use "grep".
Edit: Like just now - I can't tell you how many times I day I see this sequence:
"Sorry, I'll run in parallel"
"Error editing file"
"File must be read first"
Repeat 10x for the 10 subagents Claude spawned and then it gets stuck until you press escape and it says "You rejected the parallel agents. Running directly now"
It's proven useful for me, and I figure others might appreciate how light of a shim it is between you and the models.
I personally didn't find it to be competitve with Claude Code as a harness. Can I ask how you modified it to perform better?
-Claude-style subagents -an MCP layer for higher-level tools -Cursor-style control plane modes like Ask, Plan, Debug, and Build.
The MCP layer lets the harness use things like GitHub file/code read, PR creation, web search/fetch, structured user questions, plan-mode switching, user skills, and subagents.
So the improvement is mostly from better ui/ux orchestration and tool access. There's some things from hermes that are interesting as well.
Most of my focus has been on applying this stack to sandboxed cloud agents so you can properly code and work from mobile devices.
I can't definitively say that the stack is better or worse than Claude code, more just tuned for my use case I guess.
The main pieces I've integrated for mouse.dev inspired by claude/cursor was plan mode, agent questions, subagents, pre/post hooks, context compaction, repo-local skills, and permission modes. So mostly tools like enter_plan_mode, ask_user_question, and spawn_subagent, plus .mouse/skills and .mouse/plans.
One nice feature is continuity. If you’re working on desktop and save a plan to .mouse/plans, you can pick it up later on mobile with cloud agents, or do the reverse. You can plan something from your phone, then when you’re back at your desk, review it/build it. That was my initial goal with this project because I've found the plan act loop so helpful.
Mouse Cloud Agents is mostly an OpenCode-based harness, but everything routes through our MCP/event system so it’s mobile-first and provider-agnostic.
I intentionally skipped a lot of IDE and Claude Code style desktop features. The bet is that this new style of coding is becoming less “edit files in an IDE” and more steer a capable coding chatbot.
Would love to hear from anyone reading that's iterating on harness architecture, it's been really fun to work on.
Looked into this one. Thought it was suspicious that it only had 7 open issues on github. Turns out they have a bot that auto-closes every single issue just because.
I honestly have no words.
> Maintainers review auto-closed issues daily and reopen worthwhile ones. Issues that do not meet the quality bar below will not be reopened or receive a reply.
Seems like not an unreasonable way to deal with the problem of large numbers of low quality issues being submitted.
It's crap either way.
Like if they are going to sort through all the issues eventually (like they claim), why not just close the ones that are not worthy when they get to them instead of closing all by default?
Is it just so that the project doesnt have open issues on its github page? But they are open issues in reality because the maintainer will eventually go through them?
Nothing is "unreasonable" in the sense that an open source project should have the right to do what it wants with its rules but its definitely a weird stance.
It is a guardrail against burnout and tracker spam
Its based on their implied perspective that the majority of submissions don't follow those guidelines which helps determine their quality threshold.
https://github.com/badlogic/pi-mono/blob/main/CONTRIBUTING.m...
If all open issues are actionable items, that makes expected workload a lot easier to handle.
If most open issues are actually in "needs triage / needs review" state, you lose the signal from the noise.
The issue tracker for a project exists primarily as a tool for maintainers, not for outsiders. Yes, the maintainers could change their workflow to create a new view that only shows triaged tickets.
Or, they could ensure the default 'open' view serves their needs.
https://github.com/badlogic/pi-mono/blob/main/CONTRIBUTING.m...
- https://news.ycombinator.com/item?id=46930961 - https://github.com/mitchellh/vouch
While those are nice, Claude Code has the largest amount of plugins and skills I want to use.
If you look at the terminal-bench@2.0 leaderboard, you'll quickly see it's actually one of the weakest agentic harnesses. Anthropic's own models score lower with Claude Code than with virtually any other harness.
So it's quite the opposite. Claude Code is arguably the worst harness to run models with.
https://debugml.github.io/cheating-agents/#sneaking-the-answ...
Yes and this is a temporary discount which increases to 3.48 USD on 2026/05/31 15:59 UTC.
No sub-agents. There's many ways to do this. Spawn Pi instances via tmux, or build your own with extensions, or install a package that does it your way.
No permission popups. Run in a container, or build your own confirmation flow with extensions inline with your environment and security requirements.
No plan mode. Write plans to files, or build it with extensions, or install a package.
No built-in to-dos. Use a TODO.md file, or build your own with extensions.
No background bash. Use tmux. Full observability, direct interaction.
I could see a serious cost reduction story by using opus for design and deepseek for implementation.
Personally I would avoid anthropic entirely. But I get why people don't.
Yeah so would I, I do miss having vision tools sadly.
Probably wasn't clear enough if you don't know what that is already, apologies
It's an Asus Ascent GX10, which is a little mini PC with 128GB of LPDDR5X as shared memory for an Nvidia GB10 "Blackwell" (kind of, it's a long story) GPU and a MediaTek ARM CPU
could you tell me the long story?
edit: or wait, is it quasi-Blackwell the way all DGX Sparks are quasi-Blackwell? like the actual silicon is different but it's sorta Blackwell-shaped?
The promise of this chip was “write your code locally, then deploy to the same architecture in the data centre!”
Which is nonsense, because the GB10 is better described as “Hopper with Blackwell characteristics” IMO.
Still great hardware, especially for the price and learning. But we are only just starting to get the kernels written to take advantage of it, and mma.sync is sad compared to tcgen05
all doable but all vaguely squishy and nuanced problems operationally. kinda like harness design in general.
Once you've found the path, patches are trivial and the savings are tiny unless you're doing refactoring/cleanup.
testing gets more and more complicated. Take a look at opencode go, and you see this:
>Includes GLM-5.1, GLM-5, Kimi K2.5, Kimi K2.6, MiMo-V2-Pro, MiMo-V2-Omni, MiMo->V2.5-Pro, MiMo-V2.5, Qwen3.5 Plus, Qwen3.6 Plus, MiniMax M2.5, MiniMax M2.7, >DeepSeek V4 Pro, and DeepSeek V4 Flash
and now on your own with bugs, all of these models can produce at scale. Am i missing anything in this picture. What is the real use of cheaper models?
Maybe I need to switch to some news publication that actually does real research and writing still. Because public forums like this have been completely destroyed by LLMs.
If we touch grass in person and swap certificate requests, we can actually rebuild a trust network.
This is a pretty old problem with regards to clubs / secret societies and whatnot. And with certificates / PKI, our modern security tools have solved all the technical problems.
Claude code, on the other hand, is the most subsidized one, both for consumers (through max subscription) and for enterprises (token discounts). It is also heavily optimized for cost, specially token caching and reduced thinking, at the expense of quality.
The best two are Codex and Forge Code.
However I am using plugins and skills that are only compatible with Claude Code or work best with Claude Code.
So, for me, Claude Code with plugins like claude-meme, Context Mode, Superpowers and Get Shit Done is better than other tools.
I think everyone should test multiple models and multiple agent harness for his specific needs, codebase and way of working.
do you have any benchmarks on: - token usage over time - failures/retry rates
would be great to see how it behaves in production
https://api-docs.deepseek.com/quick_start/agent_integrations...
ANTHROPIC_BASE_URL="https://openrouter.ai/api" ANTHROPIC_AUTH_TOKEN="$OPENROUTER_API_KEY" ANTHROPIC_DEFAULT_SONNET_MODEL="deepseek/deepseek-v4-flash" CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 claudeI was able to use it in agent mode with Roo, I stopped after having it write out a plan, but I'll continue when I have more time.
Deepseek feels less likely to do a straight up rug pull since you can self host with enough money, but I'm still more excited about local solutions.
Usually I just need grunt work done. I'm not solving difficult problems.
Not only can it seamlessly and dynamically switch between DeepSeek V4 Flash, V4 Pro, and other mainstream models within the same context, but it is also 100% compatible with Claude Code.
So I think I'll stay with CC for now.
If you are interested, I've built an agentic terminal that helps manage these types of things better: https://deepbluedynamics.com/hyperia
I am using Claude Code with GLM 5.1, MiniMax M2.7, Kimi K2.6 and Xiaomi MiMo V2.5 Pro.
The edge Anthropic has on others lies on its models performance. CLI tooling (and obviously pricing) is definitely not better than others.
[1] A fancier way of saying "reducing cost."
(I have no experience with running anything locally, maybe it's a stupid question)
Also opencode tracks you by default. Its not safe. Every first prompt you send is routed through their servers, logged and they can use your data however they want
Ask AI to look at the code for you
We don't have any proof they're not logging everything, by default we should assume they are
The American firms are not demonstrating escape velocity and as long as china offers something somewhat comparable and offers it at a very low price to compensate for any difference in quality, they will not be generating enough in cash flows to finance reinvestment. I highly doubt they’ll be able to continue raising external financing for numerous periods from here on out - they gotta start showing strong financials and that they are running away from the open source models.
Already DeepSeek v4 is being hosted on Huawei Ascend 950. What do you think those cost relative to NVIDIA gear?
Not only that but other countries are very unlikely to follow suit, so it is just a straight-up productivity tax on the US.
that surprised me too. The intelligence is at the client, and by making that open, anthropic has commoditized the coding agent.