undefined

upvote

points

by aftbit14 hours ago |

upvote

by rapind10 hours ago|

[-]

ANTHROPIC_MODEL=deepseek-v4-pro[1m] ANTHROPIC_SUBAGENT_MODEL=deepseek-v4-flash

This is what I’ve been using for non-confidential projects for about a week now (soon after v4 came out). I honestly can’t tell the difference, but I’m not doing anything crazy with it either.

Worth noting that I don’t think DeepSeek‘s API lets you opt out of training. Once this is up on other providers though… (OpenRouter is just proxying to DeepSeek atm)

reply

upvote

by lhl3 hours ago|

[-]

For those that don't want their data trained on, OpenRouter allows you to have account-wide or per-request routing with either provider.data_collection: "deny" or zdr: true (zero data retention).

Also, you can use HuggingFace Inference for DeepSeek V4 or Kimi K2.6, both of which work quite well and route through providers that you can enable/disable (like Together AI, DeepInfra, etc) - you'll have to check their policies but I think most of those commercial inference providers claim to not train on your data either.

reply

upvote

by jorvi1 hours ago|

[-]

That doesn't work, if you do that it will mark DeepSeek's models with a warning symbol along with the error "paid model training violation".

reply

upvote

by BeetleB31 seconds ago|

[-]

In a sense, it's working as intended. If you set zdr to true, you currently can't use DeepSeek v4. However, once other providers offer it (it is an open model, after all), some may allow zdr.

reply

upvote

by miroljub3 hours ago|

[-]

I wonder why the question about data security and training comes often with DeepSeek, Kimi, Glm and never with Anthropic, OpenAI, and Google models.

Why is that?

IIRC, USA data protection protects data of US citizens only, foreigners data is not protected, and the companies are not even allowed to disclose when they collect those data.

reply

upvote

by zeendo15 minutes ago|

[-]

Because Anthropic, at least, gives you the option to opt out of training? I think Google and OpenAI do, too.

reply

upvote

by Matl2 hours ago|

[-]

> USA data protection protects data of US citizens only, foreigners data is not protected

HN is an American site. If you look at the US government, it is going to fearmonger about anything China related, because they haven't had a genuine competitor for decades and they're scared and lashing out. Most US news just parrot the government line, sometimes more so than state TV would, and so it reflects here.

I also feel comfortable saying that many Americans don't care one bit what happens to foreigners, be it by action of their government or companies.

reply

upvote

by SJMG1 minutes ago|

[-]

[delayed]

reply

upvote

by boondongle24 minutes ago|

[-]

Wolf Warrior diplomacy isn't even 10 years dead. The HK treaty was violated and continues to be. Taiwan gets threatened every other week.

People can have problems with America and I'm fine with that. But pretending China isn't subsidizing industry (land, education, transportation) in a predatory fashion is silly. Too many companies have gone out of business because of it. We can all have our friends in China without pretending the CCP is playing the ballgame fairly. The government doesn't need to point it out. That doesn't even get into influence operations (which are especially easy on platforms like this.)

Seriously - there may be a day in the future where Western nations and China get along but it really can't/won't happen while it's holding all the industry and trying to take the Services income as well.

reply

upvote

by giwook1 hours ago|

[-]

> I also feel comfortable saying that many Americans don't care one bit what happens to foreigners, be it by action of their government or companies.

This is true. There are also many of us who do care.

This brings to mind something I heard recently about the so-called "Rule of 10". There will always be 3 people who support you, 3 people who are against you, and 4 people who have no idea what's going on and don't care.

Don't just focus on the 3 people who are being negative.

reply

upvote

by Matl1 hours ago|

[-]

Oh absolutely.

reply

upvote

by maxgashkov3 hours ago|

[-]

As of now, OpenRouter offers multiple providers for DeepSeek with ZDR (not sure if they respect it but still).

reply

upvote

by vidarh3 hours ago|

[-]

At several times the price of DeepSeek, though, so it's a tradeoff... Even then Pro is still cheaper than Haiku.

reply

upvote

by tariky7 hours ago|

[-]

I wanted to try this. To bring back opus and sonnet do I just reset those env's?

reply

upvote

by snqb1 hours ago|

[-]

yes, this is pretty much just rerouting Claude to call Deepseek's Anthropic-style-compatible endpoints instead of its own defaults Once removed, it'll work just like before

reply

upvote

by ianmurrays6 hours ago|

[-]

Correct.

reply

upvote

by varenc9 hours ago|

[-]

The more interesting part of deepclaude is the local proxy it runs to switch models mid-session and do combined cost tracking. Though these features seem quite buried in the LLM-generated readme. Looking at the history, it appears they were added later, and the readme wasn't restructured to highlight this.

Also, the author checked in their apparently effective social media advertising plan: https://github.com/aattaran/deepclaude/commit/a90a399682defc... (which seems to be working)

reply

upvote

by yard20109 hours ago|

[-]

How come such slop is allowed here, what value do these vibe coded zero shot "projects" add? Why not just post the prompt?

reply

upvote

by woctordho5 hours ago|

[-]

For the same reason that GitHub has a releases page for uploading binaries.

reply

upvote

by throwatdem123111 hours ago|

[-]

Seriously. When I first looked this project had been pushed the first commit two hours prior. Projects should be at least 3 months old or automatically removed.

reply

upvote

by ulimn1 hours ago|

[-]

But then that would have the downside of falsely blocking projects that were developed in private and then just pushed to Github (or any public repo). Like I always use my own, self-hosted Forgejo for everything by default.

reply

upvote

by sumeno1 hours ago|

[-]

If it's a project you actually care about and are actively working on it'll be just as good 3 months from now.

If it's something that'll be irrelevant in 3 months why should anyone care about it?

reply

upvote

by throwatdem123111 hours ago|

[-]

If you develop on your own private instance and then mirror to GitHub to release it then there will be 3 months of git history in the logs.

reply

upvote

by 1 hours ago|

[-]

deleted

reply

upvote

by fragmede8 hours ago|

[-]

Convenience? Am I supposed to take the prompt and use my own tokens on it? Why should I have to do that?

reply

upvote

by otabdeveloper48 hours ago|

[-]

Recruiters used to use the candidate's Github "sources" page for evaluating candidates as a kind of proof-of-work.

reply

upvote

by groestl8 hours ago|

[-]

And recruiter agents still do.

reply

upvote

by jimmypk35 minutes ago|

[-]

[flagged]

reply

upvote

by aaurelions14 hours ago|

[-]

It seems like any project that makes fun of Claude is bound to reach the top spot on Hacker News. Even if it’s just a project consisting of four lines of code.

reply

upvote

by oblio4 hours ago|

[-]

You're just mean. I count 6 lines of code!

reply

upvote

by 13 hours ago|

[-]

deleted

reply

upvote

by ihsw13 hours ago|

[-]

[dead]

reply

upvote

by spirit2310 hours ago|

[-]

So I created https://getaivo.dev, one can use model in the coding agent directly. Just `aivo claude -m deepseek-v4-pro`

reply

upvote

by Tanxsinxlnx5 hours ago|

[-]

does it support aws bedrock provider support,does i can use any model in this

reply

upvote

by spirit231 hours ago|

[-]

Ah, for aws bedrock, just use `aivo keys add` add baseurl and apikey, everything is ready, `aivo models` to see models

reply

upvote

by spirit235 hours ago|

[-]

Currently no, but it can be added

reply

upvote

by btbuildem12 hours ago|

[-]

This in essence is what allows one to use any model with CC -- including local.

reply

upvote

by neutrinobro1 hours ago|

[-]

I know. I'm struggling to understand how this is a github repo/HN article. I've been using claude-code with a llama.cpp server and a dummy API key, and all that is required is to define 2 environmental variables to point claude at the local endpoint. Am I missing something?

reply

upvote

by niobe6 hours ago|

[-]

thanks, that was super easy.

I have been wanting to try CC with different models since Opus went downhill last month..

What limitations or issues have you noticed when using DeepSeek with Claude Code if any?

reply

upvote

by nadermx14 hours ago|

[-]

The AI wars have begun

reply

upvote

by heisenbit7 hours ago|

[-]

And they are enticing human agents to further their agendas using techniques learned from the white mice.

reply

upvote

by stingraycharles11 hours ago|

[-]

This has been possible since the beginning.

reply

upvote

by 8 hours ago|

[-]

deleted

reply

upvote

by faangguyindia10 hours ago|

[-]

those who use deepseek v4, what level of output you get? Codex 5.3 or GPT 5.4?

is flash version on level of gpt 5.4 mini

reply

upvote

by adonese8 hours ago|

[-]

I tried it on a non trivial, but also well documented and self contained task. It did amazingly well. I used deepseek v4 pro via deepseek platform. The model is very fast and also it is super cheap. I burned only 0.06 USD (I reckon how the same task would have cost me had I used e.g., amp).

PS. mentioning amp because i used to use it and I pay directly for token. I topped up 5 usd so I will be going to use it and see how far can it take me. But my impression so far is even when model subsidization is done, those open source models are quite viable alternatives.

reply

upvote

by zozbot2347 hours ago|

[-]

> But my impression so far is even when model subsidization is done, those open source models are quite viable alternatives.

My understanding is that DeepSeek V4 Pro is going to be uniquely good at working on consumer platforms with SSD offload, due to its extremely lean KV cache. Even if you only have a slow consumer platform, you should be able to just let it grind on a huge batch of tasks in parallel entirely unattended, and wake up later to a finished job.

AIUI, people are even experimenting with offloading the KV cache itself to storage, which may unlock this batching capability even beyond physical RAM limits as contexts grow. (This used to be considered a bad idea with bulky KV caches, due to concerns about wearout and performance, but the much leaner KV cache of DeepSeek V4 changes the picture quite radically.)

reply

upvote

by torginus5 hours ago|

[-]

Good. It's hard to overstate how nervous most executives are about relying on cloud-based providers.

AI currently works basically by sending your entire codebase and workflow, and internal communication over the internet to some third party provider, and your only protection is some legal document say they pinky promise they won't train on your data.

And said promise is made by people whose entire business model relies on being able to slurp up all the licensed content on the internet and ignore said licensing, on the defense of being too big to fail.

reply

upvote

by zozbot2344 hours ago|

[-]

Yes, this is the most straightforward argument for local AI inference. "Why buy cloud-based SOTA AI? We have SOTA AI at home." It's great that DeepSeek may now be about to make this possible, once the support in local inference frameworks is up to the task.

reply

upvote

by adonese7 hours ago|

[-]

Is there any place I can read about KV? Excuse my ignorance as I'm not familiar with this topic and I read scattered notes that deepseek's cost are well optimized due to how their kv cache work. But I want to read more how kv cache relates to the inference stack and where does it actually sit.

> AIUI, people are even experimenting with offloading the KV cache itself to storage, which may unlock this batching capability even beyond physical RAM limits as contexts grow.

Especially this point. Any reason that this idea was considered bad? Is it due to the speed difference between the GPU VRAM to the RAM?

reply

upvote

by zozbot2347 hours ago|

[-]

KV cache generally grows linearly with your current context; it gets filled-in with your prompts during prompt processing, and newly created context gets tacked on during token generation. LLM inference uses it to semantically relate the currently-processed token to its pre-existing context.

> Any reason that this idea was considered bad?

Because the KV cache was too big, even for a small context. This is still an issue with open models other than DeepSeek V4, though to a somewhat smaller extent than used to be the case. But the tiny KV of DeepSeek V4 is genuinely new.

reply

upvote

by spaceman_20205 hours ago|

[-]

have you used it for non coding tasks via MCP, like Figma/Paper for design or Ableton MVP for sound design?

The token cost makes it tempting to use for token-heavy tasks like this

reply

upvote

by miroljub3 hours ago|

[-]

> even when model subsidization is done, those open source models are quite viable alternatives.

Model inference was never subsidized. Inference is highly profitable with today's prices. That's why you have many inference providers. My guess, the prices for inference will go down, as more competition starts cutting the margin.

It's model training, development and R&D that cost a lot, and companies creating closed models don't have any business model except astroturfing and trying to recover training costs through overpriced inference.

reply

upvote

by 63stack2 hours ago|

[-]

It's close to Opus 4.5 for me

reply