This is what I’ve been using for non-confidential projects for about a week now (soon after v4 came out). I honestly can’t tell the difference, but I’m not doing anything crazy with it either.
Worth noting that I don’t think DeepSeek‘s API lets you opt out of training. Once this is up on other providers though… (OpenRouter is just proxying to DeepSeek atm)
Also, you can use HuggingFace Inference for DeepSeek V4 or Kimi K2.6, both of which work quite well and route through providers that you can enable/disable (like Together AI, DeepInfra, etc) - you'll have to check their policies but I think most of those commercial inference providers claim to not train on your data either.
Why is that?
IIRC, USA data protection protects data of US citizens only, foreigners data is not protected, and the companies are not even allowed to disclose when they collect those data.
HN is an American site. If you look at the US government, it is going to fearmonger about anything China related, because they haven't had a genuine competitor for decades and they're scared and lashing out. Most US news just parrot the government line, sometimes more so than state TV would, and so it reflects here.
I also feel comfortable saying that many Americans don't care one bit what happens to foreigners, be it by action of their government or companies.
People can have problems with America and I'm fine with that. But pretending China isn't subsidizing industry (land, education, transportation) in a predatory fashion is silly. Too many companies have gone out of business because of it. We can all have our friends in China without pretending the CCP is playing the ballgame fairly. The government doesn't need to point it out. That doesn't even get into influence operations (which are especially easy on platforms like this.)
Seriously - there may be a day in the future where Western nations and China get along but it really can't/won't happen while it's holding all the industry and trying to take the Services income as well.
This is true. There are also many of us who do care.
This brings to mind something I heard recently about the so-called "Rule of 10". There will always be 3 people who support you, 3 people who are against you, and 4 people who have no idea what's going on and don't care.
Don't just focus on the 3 people who are being negative.
Also, the author checked in their apparently effective social media advertising plan: https://github.com/aattaran/deepclaude/commit/a90a399682defc... (which seems to be working)
If it's something that'll be irrelevant in 3 months why should anyone care about it?
I have been wanting to try CC with different models since Opus went downhill last month..
What limitations or issues have you noticed when using DeepSeek with Claude Code if any?
is flash version on level of gpt 5.4 mini
PS. mentioning amp because i used to use it and I pay directly for token. I topped up 5 usd so I will be going to use it and see how far can it take me. But my impression so far is even when model subsidization is done, those open source models are quite viable alternatives.
My understanding is that DeepSeek V4 Pro is going to be uniquely good at working on consumer platforms with SSD offload, due to its extremely lean KV cache. Even if you only have a slow consumer platform, you should be able to just let it grind on a huge batch of tasks in parallel entirely unattended, and wake up later to a finished job.
AIUI, people are even experimenting with offloading the KV cache itself to storage, which may unlock this batching capability even beyond physical RAM limits as contexts grow. (This used to be considered a bad idea with bulky KV caches, due to concerns about wearout and performance, but the much leaner KV cache of DeepSeek V4 changes the picture quite radically.)
AI currently works basically by sending your entire codebase and workflow, and internal communication over the internet to some third party provider, and your only protection is some legal document say they pinky promise they won't train on your data.
And said promise is made by people whose entire business model relies on being able to slurp up all the licensed content on the internet and ignore said licensing, on the defense of being too big to fail.
> AIUI, people are even experimenting with offloading the KV cache itself to storage, which may unlock this batching capability even beyond physical RAM limits as contexts grow.
Especially this point. Any reason that this idea was considered bad? Is it due to the speed difference between the GPU VRAM to the RAM?
> Any reason that this idea was considered bad?
Because the KV cache was too big, even for a small context. This is still an issue with open models other than DeepSeek V4, though to a somewhat smaller extent than used to be the case. But the tiny KV of DeepSeek V4 is genuinely new.
The token cost makes it tempting to use for token-heavy tasks like this
Model inference was never subsidized. Inference is highly profitable with today's prices. That's why you have many inference providers. My guess, the prices for inference will go down, as more competition starts cutting the margin.
It's model training, development and R&D that cost a lot, and companies creating closed models don't have any business model except astroturfing and trying to recover training costs through overpriced inference.