DSpark: Speculative decoding accelerates LLM inference [pdf]

upvote

DSpark: Speculative decoding accelerates LLM inference [pdf]

(github.com)

670 points

by aurenvale9 hours ago |

upvote

by kamranjon8 hours ago|

[-]

DeepSeek continues to not only push the boundaries but also publish these incredible papers explaining how they achieved their gains - something the American labs no longer do unfortunately. Chinese labs are doing the most interesting work in AI right now.

reply

upvote

by sigmar3 hours ago|

[-]

>publish these incredible papers explaining how they achieved their gains - something the American labs no longer do unfortunately.

Google is still releasing a lot of llm architecture research. They introduced speculative decoding of LLMs in 2022[1], then released the code to perform sceculative decoding for their Gemma 4 model this year[2]

[1] https://arxiv.org/abs/2211.17192

[2] https://github.com/google-gemma/cookbook/blob/main/docs/mtp/...

reply

upvote

by kamranjon3 hours ago|

[-]

Thanks for the clarification - Google does publish more than others - and I actually really appreciate the work they are doing with the Gemma models, which are truly competitive open models. I do wish they’d publish more in depth papers on their Gemma models but appreciate that they are open weights.

reply

upvote

by DiabloD32 hours ago|

[-]

They weren't the first to do MTP like this, and arguably did it wrong: the MTP heads are kept in a separate file and have to be welded in by the inference engine.

Qwen 3.6 shipped with working MTP first, and had working MTP in llama.cpp first.

reply

upvote

by kcb9 minutes ago|

[-]

Nvidia's Nemotron 3 Super also shipped with MTP.

reply

upvote

by spijdar1 hours ago|

[-]

Given the MTP drafter is basically a separate model, keeping it separate makes more sense IMO. It's out of my wheelhouse but it seems like you could adjust the MTP drafter model separately from the main model, too.

Ultimately though the real explanation, I think, is Google doesn't care since for their own purposes (in LiteRT-LM), they do bundle them. As far as I know, anyway.

reply

upvote

by anaisbetts53 minutes ago|

[-]

I mean just like GGUFs aren't technically necessary yet are _way_ more convenient than using Safetensors and configuring the default Jinja prompt by-hand, it makes sense to bundle the draft model too. For all intents and purposes, the only people who will train a draft model are the people who train the original model

reply

upvote

by tomalaci8 hours ago|

[-]

Probably because American AI companies are on the hook for quite a lot of investment money. I think they are trying to find the magical moat to justify their valuation.

Revealing optimizations similar to these would pretty much reduce their competitive position.

reply

upvote

by lwansbrough8 hours ago|

[-]

Chinese labs are also still behind, so they’re incentivized to collaborate and have no reason to do it in private.

I suspect their tune will change if they ever take the lead..

reply

upvote

by c7b6 hours ago|

[-]

The question is also what game they're playing. Deepseek came out of a hedge fund. I think it's no coincidence that their publications tend to have a large impact on AI stock prices.

Destroying the growth story of overvalued stocks is an interesting investment strategy. It's not even new. Shortsellers understandably get terrible rep from execs, but their actions are more often in the public interest than you'd think. Normally it's exposing fraud, but here we get the really fortunate side benefit of what could eventually amount to the most significant contribution to the general software community since Linux.

reply

upvote

by merelydev6 hours ago|

[-]

> The question is also what game they're playing. Deepseek came out of a hedge fund. I think it's no coincidence that their publications tend to have a large impact on AI stock prices.

Its revealing that they always seem to publish after some big announcement by American AI companies. But regardless, this is one of the benefits of a duopoly.

reply

upvote

by nozzlegear4 hours ago|

[-]

No more revealing than OpenAI, Anthropic and Google always having some new model that just so happens to be waiting in the wings whenever their competitors announce their own model bump.

reply

upvote

by jingpostmedia6 hours ago|

[-]

The framing that Chinese labs open-source because they're behind assumes it's purely a competitive tactic. But there's a structural dimension: DeepSeek operates under a completely different funding model than US labs. They're backed by a quantitative hedge fund that views AI as infrastructure, not as a product to monetize directly. The ROI for them comes from trading alpha, not API revenue.

Chinese AI companies also face a domestic market where open-source distribution is often the only way to reach enterprise clients who won't pay SaaS premiums. The business logic aligns with openness in a way that US labs' VC-funded models don't.

reply

upvote

by NitpickLawyer5 hours ago|

[-]

> They're backed by a quantitative hedge fund that views AI as infrastructure, not as a product to monetize directly. The ROI for them comes from trading alpha, not API revenue.

That used to be true, but now they've raised ~7B$, so we'll see how / if that changes.

reply

upvote

by disgruntledphd24 hours ago|

[-]

Yeah, they were in a tough position though. All their competitors were offering equity and they didn't.

reply

upvote

by yogthos4 hours ago|

[-]

Also, we’re seeing a classic commoditization spiral with open models rapidly closing the gap and driving prices towards the marginal cost of inference. The reality is that models themselves are general commodities and there's just not enough difference between them. A company can get ahead of others by a few months, but then the rest quickly close the gap. It's a really low margin business because there's no way to differentiate yourself.

Chinese companies understand this and they're treating models as shared infrastructure akin to Linux. The money is going to be in customization niches. Companies will charge to tune models for specific use cases and charge support for that. There's also going to be money at the bottom for hardware vendors making chips and memory. But the middle tier of generic LLMs is seeing involution where there's relentless competition driving profits towards the bottom.

reply

upvote

by try-working4 hours ago|

[-]

Nope. It is purely a marketing and distribution strategy. Without open sourcing their models, their businesses would have never gotten off the ground. I've written about this here: https://try.works/writing-1#why-chinese-ai-labs-went-open-an...

reply

upvote

by oefrha8 hours ago|

[-]

Which is a good thing. Self-serving motives are more reliable than altruistic ones.

reply

upvote

by 5 hours ago|

[-]

deleted

reply

upvote

by intended7 hours ago|

[-]

The world runs on incentives. Altruism/Self-serving are down stream of that.

Wikipedia is altruistic, and serves humanity quite well.

reply

upvote

by theturtletalks7 hours ago|

[-]

Open-source is also altruistic. If DeepSeek does become self-serving once they get the top spot, it doesn’t take away from the altruistic contributions that they made towards open models.

reply

upvote

by wqaatwt6 hours ago|

[-]

> Open-source is also altruistic

Contributing to it might not necessarily be. Most open source development is funded by large companies after all and from their perspective it can function as a cost saving measure. Allowing them to focus on their core products and removing the possibility of their rivals from getting a competitive advantage due to having a superior low level stack under their product.

Which is why open source is so successful in areas where software is a cost-center but mostly failed for consumer products (since spending resources on them would actually be altruistic unlike e.g. Linux kernel development)

reply

upvote

by spongebobstoes5 hours ago|

[-]

altruism is not discernable from the outside

any altruistic act can be perceived as self serving

reply

upvote

by brookst7 hours ago|

[-]

And ultimately the motivation for those contributions just doesn’t matter, except to those who like to anthropomorphize company and argue about their souls.

reply

upvote

by Dibby0536 hours ago|

[-]

People who donated to OpenAI in its early years might disagree on that.

reply

upvote

by kelipso6 hours ago|

[-]

Or if they want to do anything close to predicting what they will do in the future, like curious and interested humans tend to want to do.

reply

upvote

by threethirtytwo5 hours ago|

[-]

No parent is right. The core root driver of the world is capitalism, open source exists downstream of that.

Software engineers need money to survive. If they exclusively work on open source stuff where are they getting money from to survive? Follow the money trail… even a donation… eventually it leads to an incentive based source or action.

reply

upvote

by mejutoco50 minutes ago|

[-]

> If they exclusively work on open source stuff where are they getting money from to survive?

From open source. You can earn money from open source. Open source is not opposed to capitalism, idk where you got that idea.

reply

upvote

by microgpt4 hours ago|

[-]

Is it though? A large number of people get to experience a lot of power over others because they moderate Wikipedia. That's certainly why some of them do it, just like on Reddit

reply

upvote

by nozzlegear3 hours ago|

[-]

I hate to quote pithy proverbs, but "the road to hell is paved with good intentions." One can have an altruistic goal which ends up harming people too, which is where that proverb comes from. Prohibition and The War on Drugs in the US are two good examples of something that had altruistic origins[†] but ended up doing way more harm than good.

[†] Another problem with altruism: we don't all agree on whether a goal is altruistic, and what's altruistic in the enactor's eyes might not be in yours. Curating a fountain of human knowledge like Wikipedia? Probably altruistic. Protecting humanity from itself by installing your company as the stewards of frontier LLMs? Not so altruistic in my view.

reply

upvote

by dragonwriter3 hours ago|

[-]

> Prohibition and The War on Drugs in the US are two good examples of something that had altruistic origins

The War on Drugs had the purpose (not just in its origin but in its perpetuation) of inflicting harm on elite-disfavored subsets of the population that could not be openly targeted for Constitutional reasons, which is about as far from an altruistic reason as it possible to get.

reply

upvote

by intended3 hours ago|

[-]

[dead]

reply

upvote

by threethirtytwo4 hours ago|

[-]

This statement is factually true and you are voted down because many people lack knowledge.

Any individual that provides free labor cannot survive off of said free labor. He must work for money to survive or get donations from someone who earned that money from incentive based labor in order to even buy the food he needs to exist as a living human being. Much of the time that labor is actually closed source.

This is a logistical reality. A lot of open source advocates are unable to get their brains out of the whole mentality that open source literally cannot exist without incentive based software supporting it. Who pays for GitHub to exist? Who pays for the food swes eat? I just code for open source all day and money falls out of the sky.

My smart friend says there are jobs that pay you to work on open source exclusively. Smart guy. In this case you follow the money trail. How does that company get enough money to pay a guy to work exclusively on open source?

reply

upvote

by docfort2 hours ago|

[-]

Free labor enables capitalism, especially if you consider labor arbitrage as a mixture of free labor and properly compensated (according to the real value) labor. From literally being born, to family culture, education, and whatever level of broad social cohesion, it’s all free labor. Without that background, money itself loses its value, since an individual cannot have reasonable confidence in trading it for something of actual tangible value. It is abstract stored value, banked into society for free. Indeed, in many cases, the free labor is in the rational self interest of a group. But stability and love and peace aren’t monetized to their true value. Otherwise, markets should be much less stable. Bubbles are only notable for the large impact of a small group of bad actors. Overall, it’s pretty amazing what free labor does. Open source is just another instance of this long and critical tradition.

reply

upvote

by threethirtytwo1 hours ago|

[-]

Free labor is derivative to incentivized labor. Your statement here doesn’t disprove or counter what I said. Again, follow the money trail. Everything you said if you follow the origin of the money it comes from paid, incentivized labor. Parents need money to raise kids… where do they get that money?? Our economy is called capitalism for a reason there is literally zero reference to charity or altruism in the vocabulary or even standard models that describe our economy and economic theory.

Put it another away: if we removed your ability to do incentivized labor and all you can do is charity work… you would run out of money and die from starvation. If we did the opposite and we removed your ability to do charity work… you’d be fine.

All of this re-emphasizes the point of this thread: In our objective reality, the world is driven by incentive based work while altruism is a side effect of surplus wealth generated by incentive based work. That is the fundamental reality.

reply

upvote

by Der_Einzige4 hours ago|

[-]

Go read Max Stirner. True "Alturism" doesn't exist. It's all egoism, even if and especially if you think it's not.

reply

upvote

by amelius8 hours ago|

[-]

You mean more predictable, not more reliable.

reply

upvote

by threethirtytwo5 hours ago|

[-]

Disagree. It’s More reliable.

reply

upvote

by rrvsh8 hours ago|

[-]

Could you explain? (asking in good faith)

reply

upvote

by IshKebab7 hours ago|

[-]

I don't think so. I can confidently predict that altruism will give you a very unreliable income stream in the vast majority of cases.

reply

upvote

by nubg8 hours ago|

[-]

Very interesting take

reply

upvote

by broodbucket8 hours ago|

[-]

Look at how far OpenAI has drifted from their original mission. Everything comes back to greed, so it's ideal for the world if selfish motives happen to coincide with what's good for the world, like advancements in open models

reply

upvote

by spongebobstoes5 hours ago|

[-]

can you elaborate? the original mission was "advance digital intelligence in a way that benefits all of humanity"

I don't see an inconsistency. money is pragmatic, the mission needs money

reply

upvote

by threethirtytwo1 hours ago|

[-]

Every company on the face of the earth has a mission statement involving some bs goal that sounds altruistic. For a good example look at googles mission statement.

The real mission statement for most companies is to make as much money as possible.

reply

upvote

by roenxi8 hours ago|

[-]

It's a standard take since it is how markets tend to work. They aren't powered by altruism, it is a big system for turning greed into good results. We don't have all this stuff because people suddenly woke up one morning and decided to be nice.

reply

upvote

by breezybottom7 hours ago|

[-]

Yes but there's more to the world than markets.

reply

upvote

by wqaatwt6 hours ago|

[-]

On aggregate mainly because humans often tend to behave “irrationally” due to various reasons though

reply

upvote

by lelanthran7 hours ago|

[-]

I don't understand what is interesting about it: it's the default.

Markets don't run on altruism.

reply

upvote

by woctordho7 hours ago|

[-]

And humans don't run on markets.

reply

upvote

by wqaatwt6 hours ago|

[-]

Mostly they kind of do since we do live in an utopian society of unlimited abundance. Extremely few people can afford to (or want to) spend a very large number of working hours without ever getting anything directly in return for it.

reply

upvote

by idiotsecant6 hours ago|

[-]

I think you made a typo of saying do instead of don't and totally reversed your argument

reply

upvote

by throw12345678916 hours ago|

[-]

Neither on altruism.

reply

upvote

by FooBarWidget7 hours ago|

[-]

The standard is applied very inconsistently. Nobody accuses the local bakery of being motivated by profit, and that they don't bake bread for you out of altruism.

reply

upvote

by AlecSchueler7 hours ago|

[-]

Isn't it the entire basis of capitalism?

reply

upvote

by resters5 hours ago|

[-]

They are focused on the things you do when you are not over-capitalized and you can’t get unlimited nvidia hardware to train on. And the results speak for themselves.

Meanwhile we in the US are blocked from buying Huawei GPUs and retirees are boasting about the nvidia in their portfolios.

reply

upvote

by tw19848 hours ago|

[-]

> Chinese labs are also still behind, so they’re incentivized to collaborate and have no reason to do it in private.

US labs in Google, Meta and SpaceX are not leading, none of them managed to build something on par with GLM 5.2.

Care to explain to me why they still don't collaborate and still choose to do it in private?

reply

upvote

by vidarh8 hours ago|

[-]

I'm not sure I'd put Google in that list, but either way: Because they think they have enough capital that they can catch up and don't need the reputational boost of this.

reply

upvote

by CuriouslyC8 hours ago|

[-]

As good as Gemini's visual intelligence is, it's a terrible agent.

reply

upvote

by 7speter8 hours ago|

[-]

Google at least still releases open source models to the public.

reply

upvote

by VorpalWay6 hours ago|

[-]

Aren't they only open weights, not true open source?

reply

upvote

by HarHarVeryFunny5 hours ago|

[-]

The concept of open source doesn't really apply to AI models since their behavior is mostly controlled by the data they were trained on and the complex ways they are trained. Having the source code of the model by itself wouldn't help you.

From a practical POV having all the training data, training infrastructure, and training know-how wouldn't help you either unless you could afford to spend the millions of dollars (hundreds of millions for a SOTA model) in compute to train it each time they released a new training set, in which case you're only talking about the big commercial companies. "open source for the people" just does not apply.

reply

upvote

by re-thc7 hours ago|

[-]

Thank Apple?

Those are mostly for embedded devices and the current "sponsor" is Apple.

reply

upvote

by wqaatwt6 hours ago|

[-]

Gemini 3.1 is still up there, though? If Google started to compete on price they could be very successful.

reply

upvote

by budsniffer9528 hours ago|

[-]

Wait, are you claiming that these companies haven't contributed to the ecosystem via research and open source?

reply

upvote

by lwansbrough8 hours ago|

[-]

No idea I don’t work there.

reply

upvote

by 8 hours ago|

[-]

deleted

reply

upvote

by threethirtytwo5 hours ago|

[-]

Also, historically, China has always viewed intellectual property as public property. Similar to open source.

reply

upvote

by arj6 hours ago|

[-]

Not everyone is motivated by greed

reply

upvote

by tirant6 hours ago|

[-]

What do you think is the underlaying motivation?

reply

upvote

by arj5 hours ago|

[-]

You ask me what I think. So far deepseek has been very consistently trying to advance state of the art research in a transplant and public way by writing papers and publishing working code. They are also not at the mercy of the stock market in the same way many Americans companies are. Before anyone assumes too much, I live in Europe.

reply

upvote

by re-thc5 hours ago|

[-]

> Chinese labs are also still behind, so they’re incentivized to collaborate and have no reason to do it in private.

Even if they're ahead they don't have enough GPUs to scale. Open sourcing is hence a good strategy to at least get market share (even if not $).

reply

upvote

by alexander20025 hours ago|

[-]

True!

reply

upvote

by jmyeet8 hours ago|

[-]

Projection is a funny thing. It causes people to misread situations all the time. Southern slaveowners feared violent retribution from freed slaves, for example [1]. It was pure projection and said more about the South than it did the slaves. The reality was there was no violent retribution. It was the opposite where the former slaveowners continued to inflict violence on the formerly enslaved.

I say this because we see the same thing used as an argument against China. "If they overtake us, they'll do imperialism (like us)." Again, it says more about us than them.

A better reading (IMHO) Of the situation is that China believes that AI shouldn't be used simply to mint a few more trillionaires but the benefits should be shared with society. Why do I say this? Because we now have 70+ years of China doing exactly that. The transformation in China all the way from rural villages to Tier 1 cities has been utterly astounding. China has lifted ~800M people out of extreme poverty.

In some ways we're at a similar point to the late 1990s and 2000s when Microsoft execs complained that Linux, being free, destroyed intellectual property value. Linux should be a perfect example of how people can and do act altruistically, or at least not in a way to bait-and-switch to enrich themselves.

[1]: https://www.reddit.com/r/AskHistory/comments/1d26grm/in_the_...

reply

upvote

by FooBarWidget7 hours ago|

[-]

It's even worse than that. China publishes stacks upon stacks of policy documents in which they explain clearly what they will do and why. This includes why they do poverty alleviation and why they believe big monopolies that own everything are bad. But almost no western observers care to read those documents. Instead, western observers, including HN, speculate endlessly about China's intentions, and "it would be naive to believe they would not do X" or drawing equivalences to Soviet Union or whatever. And the "journalists" sell this notion that Chinese state intentions are "untransparent" and "unknowable" while pretending the policy documents don't exist.

Meanwhile, Xi Jinping has published his 5th book on how governance in China works and what they're after. These are not books written for a western audience: they're compilations of speeches that he already gave to the Chinese party and state apparatus, so the contents are not sanitized for foreign audiences. But there are no English reviews of summaries of this 5th book at all by the usual China experts that distribute what western audience know about China.

This extends to beyond the government. Even though "for the people but only against the government" is an often-heard mantra, nobody seems to listen to what Chinese AI companies themselves say about why they publish open models. DeepSeek and GLM have said multiple times publicly what their motivations are, yet people on HN still speculate like they usually do.

Truly mind-boggling. I get that a lot of people don't like China. But setting aside the question of whether their dislike is justified, it would at least be rational to properly understand China, even if it's to defeat it. And listening to what China says themselves is absolutely essential for proper understanding. But people don't bother to? And they seem mostly happy with sticking to speculations that match preconceived notions, even if that hurts their chances of defeating China.

reply

upvote

by isoprophlex5 hours ago|

[-]

Extremely interesting comment, thank you. Got some links where I can download this source material? I don't read or speak the language, but will try interrogating it with an LLM

reply

upvote

by FooBarWidget4 hours ago|

[-]

The fifth book is on Amazon. https://www.amazon.com/XI-JINPING-GOVERNANCE-CHINA-V/dp/7119... It's already an English translation.

For something shorter, you can see Arnaud Bertrand's recent review. https://arnaudbertrand.substack.com/p/the-book-the-west-refu... The review is behind a paywall, but not expensive.

If you want to read policy documents directly (primary source), try the State Council / Chinese government policy database: https://www.gov.cn/zhengce/ and https://sousuo.www.gov.cn/zcwjk/policyDocumentLibrary

They also provide official translations: https://english.www.gov.cn/policies/

For Central Party documents: https://news.cn/politics/zywj/. It lists recent Central Committee / General Office / joint Party-State documents, e.g. 2026 documents on township duty lists, Party member development rules, carbon evaluation, long-term care insurance, and SOE leadership rules.

reply

upvote

by isoprophlex4 hours ago|

[-]

Thanks again, this is more than enough for a clanker-assisted rabbit hole to disappear into

reply

upvote

by jmyeet7 hours ago|

[-]

I 100% agree with you and want to add something.

If you simply take what the Chinese government says at face value, you will be correct way more often than 95% of Western policy wonks, media talking heads, "analysts" and so forth. Because, like you say, they tell you everything they're doing.

In the recent US-China summit, Xi Jinping just came out and used the Thucydides Trap metaphor, which tells you everything about where China thinks it is and where it sees the US going, which is to become increasingly belligerent as their power declines. Now whether or not you agree with that assessment (I do agree), it still tells you China wants to avoid open hostilities, it sees itself as continuing to rise and it fears what a declining US might do.

reply

upvote

by FooBarWidget7 hours ago|

[-]

The Thucydides Trap mention is different from what you describe. Xi has dismissed the Thucydides Trap multiple times in the past as being hearsay and self-imposed bias (https://www.globaltimes.cn/content/944179.shtml). "We should strictly base our judgment on facts, lest we become victims to hearsay, paranoid or self-imposed bias. There is no such thing as the so-called Thucydides trap in the world. But should major countries time and again make the mistakes of strategic miscalculation, they might create such traps for themselves."

But western politicians keep raising this metaphor. So at some point they're like "okay we'll speak your language". They then used this metaphor to make the point "our rise isn't the threat, your fear of it is. If you resist it you're walking right into the trap Thucydides warned about". So your conclusion is still right, they don't want open hostilities, a stable world is in their interest.

Then western media ran away with this and were like "OMG Xi mentioned the Thucydides Trap", completely ignoring his point.

reply

upvote

by colordrops8 hours ago|

[-]

So the marketplace is working.

reply

upvote

by abc123abc1238 hours ago|

[-]

This is the way! Open source models will benefit, and once open source models reach the state of "good enough" the hyped up US AI companies will fear, since the availability of free, good enough, AI models will set the ceiling for how much they can charge. Then the bubble will pop.

reply

upvote

by VorpalWay6 hours ago|

[-]

You mean open weights, I guess? There are as far as I know very few open source models, the training data is seldom released. Sadly.

reply

upvote

by skeledrew7 hours ago|

[-]

Regardless of where they are, the Chinese will always share their progress, as they're collectivist/cooperative at their core, compared to the individualistic/competitive US.

reply

upvote

by davedx4 hours ago|

[-]

I don't really see the moat for frontier AI labs being "more efficient models" although that could help their margins - I think moats will be built by expanding the horizontal and vertical market expansion - like Anthropic is doing the most at the moment

reply

upvote

by baxtr7 hours ago|

[-]

Who is financing DeepSeek and what are they expecting in return?

reply

upvote

by nmfisher6 hours ago|

[-]

Until recently, DeepSeek were self-financed (it was a spin-out from a hedge fund). They just raised ~50million RMB (US$7bn), and according to media [0] (which admittedly can be unreliable), the lead investors were:

1) The CEO himself 2) Tencent 3) CALT (the battery company) 4) NetEase (internet/media company) 5) JD.com (ecommerce) 6) Chinese investment firms

What are they expecting in return? I'd say the same thing that all those investors in OpenAI and Anthropic are expecting - profit.

[0] https://finance.sina.com.cn/stock/vcpe/2026-06-11/doc-iniazi...

reply

upvote

by gniv6 hours ago|

[-]

I don't think this question would get to the reason. There could be one or two persons in charge who simply shape the culture of the company, including how much to publish.

reply

upvote

by archerx7 hours ago|

[-]

They are self financed, the company that makes DeepSeek is a finance company that trades on the markets.

reply

upvote

by rsanek7 hours ago|

[-]

The CCP's approach has historically been to subsidize their companies far more than other countries do. Why would LLMs be any different?

https://www.oecd.org/en/data/dashboards/magic-database-indus...

reply

upvote

by declan_roberts6 hours ago|

[-]

Access to everything every American company feeds into the AI is well worth it to the CCP.

reply

upvote

by eagleal2 hours ago|

[-]

Even the latest World Bank report, the defacto neoliberal institution, recognized a couple of months ago that leaving the industries focus be dictaed by purely capital decisions was bad, as in _really_ bad.

reply

upvote

by u80805 hours ago|

[-]

According to EU statistics, yeah

reply

upvote

by dgellow5 hours ago|

[-]

OECD isn’t the EU.

And regarding the dataset:

> Unlike most OECD databases, which rely on government data provided at country-level, the OECD MAGIC database uses firm-level data. The subsidy estimates included in the database are based on raw data obtained from firms’ annual reports, financial statements, bond prospectuses, IPO prospectuses, etc. The data are collected and verified manually by the OECD to maximise accuracy, consistency, and comparability. In some cases, additional information is also obtained from government databases, either to verify the firm-level information or to complement it. Care is taken to avoid double-counting where the data mix corporate and government sources.

reply

upvote

by nixon_why695 hours ago|

[-]

Does that figure hold up when we look at Silicon Valley financing? Uber alone was subsidized to the tune of billions. Let alone the recent batch where we're into hundreds of billions.

reply

upvote

by baxtr7 hours ago|

[-]

Even if they were fully self-financed, which isn’t the case, they would expect something in return.

reply

upvote

by nickthegreek6 hours ago|

[-]

You can give them money by using their api. Just because their model is open, doesn’t mean they are a non profit.

reply

upvote

by archerx6 hours ago|

[-]

Not everyone has the American “fuck you got mine” zero sum game attitude. Also they’re making some of the American and European AI companies look bad which they can leverage with their trades if they wanted to.

reply

upvote

by bushido6 hours ago|

[-]

IMHO to promote that China believes in free markets and making the technology available to all.

Which will likely help them bolster the sales of the MANY new AI chips in development/use in China to international markets. Dislodging Nvidia.

Kinda the opposite of what Jensen Huang (Nvidia) thinks US is doing: https://www.youtube.com/shorts/u3SY8nvjhQA

Edit: I'm a fan of deepseek and believe it's good to make the technology open/available. And do think that also help business - which I support as well.

Edit 2: No idea why I'm getting downvoted. That's also their official stance https://english.www.gov.cn/news/202601/08/content_WS695f1b55...

reply

upvote

by panny6 hours ago|

[-]

Short AI companies

???

Profit!

Not suggesting this is it, but you know, one possible angle.

reply

upvote

by cromka8 hours ago|

[-]

I seriously am far from fear mongering and doomsday mentality, but I just can't see how OpenAI and Anthropic can have a successful IPO if the quality gap between the free and paid continues to narrow like that...

reply

upvote

by cyanydeez8 hours ago|

[-]

[flagged]

reply

upvote

by 720273729208 hours ago|

[-]

[dead]

reply

upvote

by 28383838388 hours ago|

[-]

[flagged]

reply

upvote

by zhoBEENG5 hours ago|

[-]

For real. Reading old comment threads makes me sad, because the level of discourse was so much higher in the past. Although this place is still deeply appreciated, it’s clear that its culture is going monotonically towards reddit.

Is there anywhere public anymore that isn’t being overrun by lobotomized p-zombies (partisan zombies)? Is it even possible to make such a public space? Ressentiment consumes all discourse.

reply

upvote

by speed_spread6 hours ago|

[-]

Yet accumulation of power by a very small elite through state and selected corporations happens to be a defining characteristic of that political regime.

reply

upvote

by cyanydeez7 hours ago|

[-]

you're right, full of corporate sock puppets shilling their vapor wares, idly dreaming that the world isn't what it is.

reply

upvote

by bluerooibos4 hours ago|

[-]

> Probably because American AI companies are on the hook for quite a lot of investment money

That's a lot of words to say it's just capitalist greed.

reply

upvote

by spacebacon7 hours ago|

[-]

[dead]

reply

upvote

by budsniffer9528 hours ago|

[-]

Do you think that DeepSeek are building their models for free, or something? They aren't "on the hook" for anything?

What's with all the China glazing about this stuff? They release some open-source work and people act like they are suddenly the beacon of freedom and transparency.

reply

upvote

by abc123abc1238 hours ago|

[-]

This is incorrect binary thinking. Them releasing open source can be good, but that does not commit you to think that china or chinese companies are saints. There are many shades of grey here and one does not exclude the other (nor include it).

reply

upvote

by budsniffer9527 hours ago|

[-]

Are you reading the comments?

reply

upvote

by 134156 hours ago|

[-]

I think there are some sockpuppet accounts active but what also contributes is that many people are absolutely fed up with US technological hegemony and welcome alternatives to core technologies from elsewhere.

reply

upvote

by 1over1376 hours ago|

[-]

Not just US technological hegemony, but the USA has threatened to invade Europe (Greenland) and Canada, and has actually invaded Venezuela and Iran. China hasn't. Maybe lots of people that live in those places are now switching sides.

reply

upvote

by dgellow5 hours ago|

[-]

Over the past 2y the US also started a trade war with Europe, triggered the worst oil shock the world ever experienced for no reasons, threatened to leave NATO, tried to force Ukraine to give up its territory to the invading country, and way more

reply

upvote

by 7speter8 hours ago|

[-]

I’m think its in our best interests to lever these american ai companies to exhibit at least some degree of freedom and transparency anyway we can…

reply

upvote

by herodoturtle8 hours ago|

[-]

Publishing by necessity I wonder? American labs on the cutting edge pioneering the way forward, so Deepseek open sourcing what they’ve got is to help even the playing field.

Hopefully the experts here can offer insight. The above is just my hunch and I’m not a specialist in this field.

reply

upvote

by try-working7 hours ago|

[-]

Yes, challenger Labs publish out of necessity. It is a marketing strategy. People assuming open source means giving something up, but the reality is that Z.ai has a revenue of some $100M and it would be about $0M if they never open sourced their models.

reply

upvote

by jonplackett8 hours ago|

[-]

Wouldn’t that just help the American labs anyway though? Or do they assume they’ve actually already figured this stuff out and kept it secret?

reply

upvote

by vintermann7 hours ago|

[-]

It used to be the case that NSA hired the majority of all math graduates in the US, and were assumed to be years ahead in cryptography. Yet in the 90s, it became clear that they no longer were that - among other things, the cipher of the notorious Clipper chip was broken, and we can rule out that it was made weak on purpose because the whole point of Clipper was that they had a backdoor.

So, despite hiring the cream of the crop of math graduates, who could read the papers of free academia, but whose own result the free world could not access - they fell behind.

I have a theory explaining why. I think it's because science is an interactive process. NSA cryptographers could read papers, but they couldn't talk openly with the authors of those papers, because of secrecy demands - even asking question might indicate what they were working on. You can easily imagine them spending months on something they could have avoided by going to the original authors and getting told "Oh, we tried that for a long time, it doesn't work".

Whether that theory is right or not, cryptography is a concrete example of a domain where public research with fewer resources beat private research with a lot more resources.

reply

upvote

by idiotsecant6 hours ago|

[-]

Everyone in this thread is getting distracted by nationalism, but you hit the nail on the head. In this case for whatever reason the Chinese AI industry is collaborative and the American AI industry is not. This will result in the Chinese companies making progress faster. Full stop. This isn't a judgement on the merits of either system, only an observation of likely results.

reply

upvote

by tiahura4 hours ago|

[-]

Hasn't that been the mantra of open source for 40 years. Armies of companies, trillions of valuation, or even just Wayland, suggest that isn't always the case.

reply

upvote

by eikenberry29 minutes ago|

[-]

So free software can only be considered a successful strategy if every single project succeeds?

reply

upvote

by NamlchakKhandro5 hours ago|

[-]

Reminds me of Dot Net in the early 2000-2012... No one collaborated

reply

upvote

by 7speter7 hours ago|

[-]

From what I gather, the Chinese are behind, but a lot of their research amounts to scrappy, clever discoveries in how to use more novel technologies (for Qwen and Deepseek, its mixture of expert models, that can do inference using a portion of the model at a time). The chinese also distill information from American models, so there’s that.

The American companies, from my impression don’t involve themselves with such lowly “hacks” because they have so much money to just push forward with doing everything on big heavy models that run on the most cutting edge nvidia chips that they can, the moment, kinda sorta get on demand (I say that in some degree of jest).

reply

upvote

by idiotsecant6 hours ago|

[-]

The American companies would love to develop these 'hacks' because it would make them more money, something they are in existential need of right now.

They don't develop them because they don't collaborate publicly anymore.

Where would the whole industry be if Google never allowed publishing the transformers paper?

It's not a coincidence that the American AI industry grew fastest in capability when it was the most open.

reply

upvote

by 7speter4 hours ago|

[-]

Just a crazy catch 22, it seems

reply

upvote

by tiahura4 hours ago|

[-]

Why would they collaborate? Why not defect and just keep theirs private and implement the open ones?

reply

upvote

by _0ffh8 hours ago|

[-]

I'm afraid I'm even balking at the word "pioneering" in context with US frontier labs. They are probably doing a few new things, right, but they are not blazing any trails for others to follow along, the Chinese are.

reply

upvote

by skeledrew7 hours ago|

[-]

> Publishing by necessity

It's more a cultural thing. Sharing progress is just in their blood.

reply

upvote

by idiotsecant6 hours ago|

[-]

This is overly simplistic to the point of glazing. Plenty of Chinese companies maintain industrial secrets to gain an advantage.

reply

upvote

by epolanski8 hours ago|

[-]

Chinese papers and techniques have been very influential and copied by US labs.

Multi-head Latent Attention (MLA), Multi-Token prediction, MoE architecture are some of the most famous examples.

reply

upvote

by HarHarVeryFunny6 hours ago|

[-]

MoE is from Google (Noam Shazeer)

MTP is from Meta

Another DeepSeek advance that the west are copying is DeepSeek Sparse Attention (DSA)

reply

upvote

by xgk18 minutes ago|

[-]

Mixture-of-Expert (MoE) was introduced in the 1990s [1, 2], see also [3, 4]. The idea was that MoE scales up model capacity and only introduces small computation overhead. MoEs did not become viable for high-performance applications until sparse routing was integrated with modern deep networks, made possible by large-scale distributed computation. The breakthrough came with the development of sparsely gated networks [5], which showed that it is possible to maintain model accuracy while activating only a small fraction of a large parameter network during both training and inference.

[1] R. A. Jacobs, M. I. Jordan, S. J. Nowlan, G. E. Hinton, Adaptive mixtures of local experts. (1991)

[2] M. I. Jordan, R. A. Jacobs, Hierarchical mixtures of experts and the EM algorithm. (1993)

[3] L. Xu, M. Jordan, G. E. Hinton, An alternative model for mixtures of experts. (1994)

[4] S. Waterhouse, D. MacKay, A. Robinson, Bayesian methods for mixtures of experts. (1995)

[5] N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. Le, G. Hinton, J. Dean, Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. (2017)

reply

upvote

by thesmtsolver217 minutes ago|

[-]

This is so out of touch. Go to Neurips or the top AI conferences to see what is happening.

reply

upvote

by gmerc6 hours ago|

[-]

Deepseek is commoditizing the performance gains US labs rely on to make their investors money.

reply

upvote

by SubiculumCode1 hours ago|

[-]

If American labs aren't publishing, it doesn't mean they aren't doing even more interesting work.

reply

upvote

by californical1 hours ago|

[-]

You could also come up with a cure for cancer, but if nobody knows what you’ve done then there’s not a whole lot we can say about it

reply

upvote

by garn8106 hours ago|

[-]

Yep. It's about time western world realized Chinese are not the "very bad guys under dictatorship"

reply

upvote

by cloudfudge3 hours ago|

[-]

I don't think it's very common to believe the Chinese people are bad guys. It's the government and its control of the people that's the problem. And no, I don't think the US is immune to that sort of problem either.

reply

upvote

by 3abiton6 hours ago|

[-]

Honestly it's just a hierarchy difference between the two countries. In the US, tech/fin/military companies have the upper hand compared to the government (fragmented between 2 parties). Despite the sharades with Anthropic, Tech-fluencers are in control. Compared to china, the government (dictatorship) has more control over Tech companies (take any example from the past 10 years). For them, undermining the US AI supremacy is an objective, and releasing open weight models is the way, and I'm all for it.

reply

upvote

by idiotsecant6 hours ago|

[-]

Let's not get crazy here. You can acknowledge that the Chinese AI industry has some structural advantages right now without trying to claim anything else. China is still a brutal autocracy.

reply

upvote

by pmarreck36 minutes ago|

[-]

They push the boundaries, alright. Of obtaining the results of work without doing the work themselves, which I hate to say it but this is classic Chinese machiavellianist business behavior:

https://www.cnbc.com/2026/06/24/anthropic-alibaba-distillati...

reply

upvote

by teekert5 hours ago|

[-]

I'm deep seeking for that open in OpenAI indeed. It’s clear who’s the most anthropocentric in this space.

reply

upvote

by godwinson__4-822 minutes ago|

[-]

The idea that America is going to stay ahead of China is I think at this point clearly delusional. It's also just such silly framing. Why should 350 million people stay ahead of 1 billion people on the other side of the world? If an AI lab in China cures cancer or something do Americans lose?

So many Americans seem to (at least in theory) be ready to sign up for this ongoing confrontation with China. Does anyone think it isn't America who is poking the bear when it comes to the Thucydides trap? Why not try to get along? It occurs to me the only people more Chinese innovation would hurt are the mega cap class in the United States. Elon Musk certainly doesn't want BYD in the United States. Same story all the way down with these super capitalized AI companies. Most average Americans would probably be better off in a world where the United States and China got along. But its those Americans who will be called upon to suffer most of the burden if that trap ever springs.

reply

upvote

by thesmtsolver220 minutes ago|

[-]

By this population-only logic, you should concede that India will overtake China.

Why not talk about how China shut out American companies for decades before complaining about BYD?

As an Indian immigrant, the PRC China has engaged in conflict with almost all its neighbors and stated wars in its short history.

China is not so benevolent when they get to the #1 spot:

https://m.economictimes.com/industry/renewables/china-wto-co...

reply

upvote

by godwinson__4-89 minutes ago|

[-]

Its not population only logic, but it does underscore that it is silly to expect the United States to inevitably be ahead.

As for the rest of it:

https://youtu.be/74DAI2hr9Kk?t=159

reply

upvote

by rvz8 hours ago|

[-]

Exactly. They did not have to open up their research up and this is what happens when smart researchers are forced to squeeze performance gains out of existing hardware.

They don't have TPUs or access to the latest Vera Rubin GPUs either to get performance gains for free. All of the optimizations Deepseek have done are in software and it goes down to the PTX assembly level.

Compared to Anthropic who are celebrating in fixing a flickering issue in a terminal app which took months to fix.

reply

upvote

by HarHarVeryFunny6 hours ago|

[-]

> All of the optimizations Deepseek have done are in software and it goes down to the PTX assembly level

DeepSeek are still using NVIDIA (PTX) to train on, but for inference have already transitioned to Huawei Ascend chips, and inference speed is what this paper is addressing.

reply

upvote

by vidarh8 hours ago|

[-]

> Compared to Anthropic who are celebrating in fixing a flickering issue in a terminal app which took months to fix.

It's funny, because if you ran Claude Code on a slow terminal, the cause of the flicker was obvious: They kept dumping the entire history of the chat back into the terminal in a number of situations, and relied on the terminal to them end up in the correct state.

reply

upvote

by yorwba8 hours ago|

[-]

Anthropic almost certainly also has optimized software down to the assembly level, considering this take-home interview challenge they published: https://github.com/anthropics/original_performance_takehome/... which is all about instruction-level performance optimizations. That they don't prioritize UI fixes just means they consider other things more important.

reply

upvote

by lelanthran7 hours ago|

[-]

Unlikely: that product is written completely by AI, of which they are not lacking.

More likely is that an AI generated codename is impossible to fix by humans, and SOTA was not able to figure it out until now.

reply

upvote

by lionkor7 hours ago|

[-]

that's pretty silly to use as a measure of what they do internally

reply

upvote

by saagarjha6 hours ago|

[-]

It's pretty representative of what they do internally

reply

upvote

by saagarjha6 hours ago|

[-]

All frontier labs are working down to the PTX level (and lower)

reply

upvote

by utopiah7 hours ago|

[-]

It's almost as if ... they were what OpenAI was when it started. Sad to see but glad someone is doing is.

reply

upvote

by darkoob127 hours ago|

[-]

Google and Microsoft publish more than enough and American universities are publishing the science beyond DeepSeek's engineering. That fact that you don't know about them means you're not following the science only reading hacker news.

reply

upvote

by kamranjon4 hours ago|

[-]

Google hasn’t published much in depth ML work since T5 (which was hugely influential at the time) - most Gemma releases are 1-3 page model card pdfs these days with no in depth analysis. Even TurboQuant is shaking out to have basically been a rehash of previous work without proper attribution. I do think Microsoft is doing some interesting things with smaller models but haven’t read much research, interested in any refs you might have to share!

reply

upvote

by darkoob121 hours ago|

[-]

Check recent iclr acl icml neurips you will see 10-20 papers from Google Research which are not just simple model cards. they are solid reproducible research.

reply

upvote

by epolanski8 hours ago|

[-]

R1 was very influential on US models development.

reply

upvote

by 7 hours ago|

[-]

deleted

reply

upvote

by resters5 hours ago|

[-]

Thank you so much to everyone at DeepSeek who is working on this and who have the courage and generosity to open source this for humanity.

We in the United States will never forget!

For all the harm Trump does to the US at least he is helping China!

reply

upvote

by jmyeet8 hours ago|

[-]

Chinese companies (and labs) operate in conjunction with the CCP so whatever they're doing, it's because it's Chinese state policy.

What became clear when DeepSeek came onto the scene was that China was seeking to commoditize LLMs. They consider it an issue of national security not to be beholden to US tech companies when it comes to AI. And I, for one, fully endorse this policy.

Another data point on this is the black market for Claude tokens in China [1]. The chat logs themselves are a commodity to train models.

I believe that OpenAI in particular is a bet on a trillion dollar pot of gold that doesn't exist. Google, Microsoft, Amazon and Meta will all be fine. Anthropic is in a far better position than OpenAI (IMHO) but if DeepSeek or some other Chinese open weight model gets as good at coding, they're in real trouble too.

[1]: https://news.ycombinator.com/item?id=48667495

reply

upvote

by anon3738397 hours ago|

[-]

I don’t see how Anthropic is in a better position. They have a slight edge in model quality right at a time when we’re getting a taste of what cheap, “good enough” AI looks like. They don’t own their own compute. And their own arrogance and lies have alienated a huge chunk of their customer base and alerted everyone to the dangers of being dependent on them.

reply

upvote

by jmyeet7 hours ago|

[-]

I personally think not owning their own compute is going to be an advantage.

There is a meteor headed towards all this AI investment that I don't think has been properly accounted for and that is, what happens to all the existing hardware investments when NVidia's next architecture comes out. Blackwell (H100/H200) is the current generation. Rubin (R100, presumably R200) is the next and arrives soon. Now a lot of the investment hasn't been spent yet so will likely be spent on Rubin but at that point, what happens when the next iteration comes out and does 3-4x the compute for the same electricity input and same hardware cost?

Also, what happens when people can run way bigger models on consumer hardware in 5 years? The effective limit for useful local LLMs is currently ~31B parameter models because the RTX 5090 has 32GB of VRAM and Apple's shared memory architecture, which can keep bigger models in memory, just doesn't have the raw processing power.

Anyway, why I argue Anthropic is in a better position (than OpenAI) is that they seem to have captured a market that may well be profitable for them as a company, specifically Claude for coding. So they just haven't burnt quite as much cash as OpenAI so aren't in as deep of a hole.

While I think local models are going to improve maassively over the next few years, running them in a data center at scale is always going to be cheaper for a company. Why? Because they can amortize their costs by running 24/7 and powering them and cooling them is simply cheaper at scale when you're talking about 1000+ engineers who otherwise might only be using their hardware ~40 hours a week.

IMHO Google is in the best position here of all the US companies, even though their models aren't the best, because their data centers are ruthlessly efficient, their homegrown TPUs will eventually catch up (and thus avoid the NVidia tax) and they simply haven't bet the farm on winning AI.

reply

upvote

by Schiendelman6 hours ago|

[-]

I'm generally with you on all of these ideas.

However, Google probably won't catch up. Nvidia has been winning in spite of the fact that their hardware is general purpose rather than tuned for inference.

Rubin has architectural differences I don't understand that are supposed to make inference much cheaper and faster while still retaining those other more generic capabilities. Their next generation after that is going to do even better at being fast for inference and general purpose.

Google is betting that their TPUs won't depreciate faster than the markup they have to pay to Nvidia. I don't think they will be right.

reply

upvote

by Der_Einzige4 hours ago|

[-]

Why do people who don't follow the prices of A100 talk like they know things about GPU pricing dynamics?

A100s are ~7 years old and going for more than 2 dollars an hour, significantly more expensive than even 2 years ago. This is because anything with 80gb of VRAM or more and made by Nvidia will have economically useful lifespans of like, 10 years.

I could see H100s getting 12 years.

Micheal Berry doesn't know shit about GPUs.

reply

upvote

by jmyeet3 hours ago|

[-]

So I was curious about how A100s would do running DeepSeek v4. I can't find any instances of running v4 Pro on even an 8xA100 cluster. So you need to run Flash at ~284B params. A100s don't support FP8 so you're running FP16 so you're taking a hit that way. But I see estimates of 30-50tok/s for an 8xA100 cluster. They're drawing 300-400W each so you're looking at probably 3500+ Watts, which is roughly 0.01tok/W.

Now jump ahead 2 years and you seem to have a massive jump in performance [1]. The tokens/Watt goes up by at least 2 orders of magnitude. And the B100 is 3-4x that. And we're about to hit the R100 (Rubin) cliff.

That's what this is going to come down. When hyperscalar DCs are getting to Gigawatt power usage, it all comes down to power efficiency. Those A100s aren't far from being sold for scrap.

I've been looking into how different companies are handling depreciation for this. Amazon seems to be saying the life is 3-4 years, Google 4-5 and Meta is saying 8+, which I think is wildly optimistic.

[1]: https://lambda.ai/inference-models/deepseek-ai/deepseek-v4-f...

reply

upvote

by tw19847 hours ago|

[-]

> Another data point on this is the black market for Claude tokens in China [1]. The chat logs themselves are a commodity to train models.

anyone with IQ higher than 130 (thus qualified for actual AI R&D) would be questioning something obvious here -

if they are already doing such dodgy stuff with the aim to maximize profits, why would those resellers have large amount of logs with actual American model responses to sell to those AI labs in the first place. shouldn't they just post train & customize some leading Chinese open source models to pretend to be Opus or GPT for the vast majority of their users (as classified by some models) who don't know much about expected Opus behaviours & not skilled enough to tell the differences?

that is actually the interesting bit not covered in your censored version of the story line, it is also what happens on the ground. your censored version of the story implies that those dodgy resellers using stolen credit cards, pooling accounts with stolen IDs and illegally selling very personal logs would somehow be honest enough to spend extra $ to ensure their victims (aka paying users) can actually use real Opus and GPT. LOL

dude, you failed this IQ test miserably.

reply

upvote

by saagarjha6 hours ago|

[-]

You don't actually need a very high IQ to do AI R&D. More than it takes to post IQ comments on this site, maybe.

reply

upvote

by jampekka7 hours ago|

[-]

The galaxy brains in the labs putatively buying the logs wouldn't notice this? Or figure out a structure to prevent this?

reply

upvote

by tw19847 hours ago|

[-]

resellers wouldn't be trying to sell such junk in the first place. they use faked models to avoid the cost of Opus tokens, not to double dip to scam those with arguably the highest IQ in the country.

reply

upvote

by 7 hours ago|

[-]

deleted

reply

upvote

by OtomotO7 hours ago|

[-]

The difference between greed and power

reply

upvote

by dakolli7 hours ago|

[-]

Its because our culture worships pieces of paper the government tells us is worth something.

reply

upvote

by mordae6 hours ago|

[-]

Nope, people seek it out because government tells them to pay taxes _or else_.

reply

upvote

by IAmGraydon7 hours ago|

[-]

Money is just a physical representation of the ability to get what you want. The problem is not money. It’s the fact that we live in a “me” society.

reply

upvote

by DivingForGold7 hours ago|

[-]

Sure, in part by "stealing" from American AI companies with Distillation attacks:

https://yipzap.com/anthropic-accuses-alibaba-of-largest-ai-d...

reply

upvote

by pennomi7 hours ago|

[-]

If your moat is “please don’t copy my outputs”, you don’t have a moat. There is no such thing as a distillation “attack”.

reply

upvote

by pmarreck29 minutes ago|

[-]

How very machiavellianist-libertarian of you.

Don't even try to combine it with any notion of "leadership" then, however, since distillation is literally "copying the actual leader"

reply

upvote

by steinvakt27 hours ago|

[-]

How does it differ from pirating music or movies?

reply

upvote

by Balinares6 hours ago|

[-]

According to US AI labs, training on other people's output is fair use. So that's how.

reply

upvote

by Zigurd6 hours ago|

[-]

AI training is considered transformational. That's how AI training gets around copyright and it's probably consistent with copyright precedent. For example, indexing the web is considered transformational, even though you can recover the full text of everything in an inverted index.

reply

upvote

by pornel6 hours ago|

[-]

Machine-extruded text is not copyrightable, since there was no human creativity involved in producing it.

(and if you argue the US models do produce copyrighted works, then oooops - whose copyright is it huh?)

reply

upvote

by bethekidyouwant6 hours ago|

[-]

Ow my head.

reply

upvote

by ReptileMan6 hours ago|

[-]

That when I pay for a model, the copyright of the output belongs to me. This is as work for hire as it gets.

reply

upvote

by Jonnerz6 hours ago|

[-]

US AI companies trained their own models on vast amounts of copyrighted and publicly available content without obtaining permission. There's no moral high ground here.

reply

upvote

by pmarreck31 minutes ago|

[-]

You know what, if someone wants to downvote this guy by claiming distillation attacks are not "attacks" or don't cross some ethical bound (especially since I just posted a similar comment), then go right ahead, but if you're combining it with any notion of "leadership", that's like saying that the person in 2nd place in a bike race who is drafting behind the person actually in 1st place is exhibiting "leadership".

There's no "leader" if, absent someone whose results you're copying, you are an emperor without clothes

reply

upvote

by orbital-decay2 hours ago|

[-]

Besides "attack" being a ludicrous name for distillation, note how your article says "accuses", also it's mostly about Alibaba, not DeepSeek (although it's mentioned there). Both Dario Amodei and Sam Altman publicly claimed that DS used their outputs to train their models, and knowing the differences between all these models by heart, I believe they're simply lying through their teeth to sway the public opinion and/or the policy. These models are absolutely nothing alike, and distillation necessarily makes student's outputs similar to teacher's. This is very visible in Z.ai models (which were trained on Gemini outputs to the point that they repeated Google's conditional prompt injections in the CoT, and later on Claude where it started repeating their CoT as well) and certain Google models which were trained on Claude's outputs in a roundabout way. Distillation always shows up in the result.

And certainly they have no idea whether these outputs (assuming they ever existed and it wasn't made up) were used for training. The article mentions that DS made 150k requests. This isn't much and might have been just an eval or a benchmark to compare their own model against. It's really hard to believe DeepSeek had any Claude outputs anywhere in their training schedule, since it's just too different. Besides training on random vibecode of course, which is mostly written by Claude.

reply

upvote

by NitpickLawyer5 hours ago|

[-]

While I don't agree with your comment being downvoted, I don't think distillation is either an "attack" nor is it "stealing". The idea that someone else gets to decide how I use tokens that I pay for is ludicrous.

Imagine if your casio calculator would come with a ToS that says you can't use it to develop a competitor calculator or any other tools. Or that your hammer can't be used to make other tools. Or, closer to the HN crowd, imagine MS in the 90s saying that you can't use their OS to build competing services to MS. They'd be laughed at and be split immediately if they tried that.

The only thing they can do is to refuse serving tokens (and even that's debatable, if we get to tokens being commoditised). But that's gonna be a game of whack-a-mole, and they know it.

reply

upvote

by kamranjon8 hours ago|

[-]

The hugging face models are already up and seem to be the original models with the speculative decoding module built in which is very cool:

Flash: https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash-DSpark

Pro: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro-DSpark

Excited to see if this makes it into DwarfStar for local inference, have been using the flash model extensively since the 2-bit quants were made available by antirez.

reply

upvote

by ilaksh5 hours ago|

[-]

Any chance they will have this for Qwen 27 b also?

reply

upvote

by kamranjon5 hours ago|

[-]

The paper actually references testing their DSpark speculative decoding strategy with Qwen 3 4b, 8b and 14b models so while I doubt they will release builds themselves, they’ve open sourced (DeepSpec) their training pipeline for this so we will likely see folks adopting for other models.

reply

upvote

by StizzurpXDD6 hours ago|

[-]

DeepSeek is, as I feel currently, the sole AI company which is actually trying to innovate rather than top mere benchmarks. Others like OpenAI, Anthropic and Google are mostly just competeing with each rather than keep innovating around the clock.

reply

upvote

by pmarreck28 minutes ago|

[-]

Please explain how distillation == innovation

Especially since your 5-day-old account is sus, and thus likely not yet proven not to be a Chinese bot

You can't lead by following the actual leader LOL

The only real innovation I've seen from Deepseek is the out-loud reasoning thing in R1

reply

upvote

by Alifatisk6 hours ago|

[-]

> DeepSeek is, as I feel currently, the sole AI company which is actually trying to innovate rather than top mere benchmarks.

I'd also include the other Chinese labs like Moonshot (behind Kimi) and Z.ai (behind GLM). They are innovating and continue openly sharing their research to the public. I believe the founder of Moonshot even shared 40 minute video on Twitter where he goes through techniques that powers Kimi.

reply

upvote

by nicce5 hours ago|

[-]

> Others like OpenAI, Anthropic and Google are mostly just competeing with each rather than keep innovating around the clock.

The strategy for the most companies in the US has been for a long time to capture the social audience, whatever the mean is. Quality and innovation is the second factor. Capture the market, lock in the users, influence regulation and lobbying to keep the power.

reply

upvote

by 4 hours ago|

[-]

deleted

reply

upvote

by otterley3 hours ago|

[-]

> Others like OpenAI, Anthropic and Google are mostly just competeing with each rather than keep innovating around the clock.

They compete with each other by innovating. The innovations result in more utility for the customer, but the technology isn't made public. Trade secrets are secret for a reason.

The reason people may think that DeepSeek is the "most innovative" is because of what they can observe from the outside, much like people may mistakenly conclude models are the "prettiest of the population" because not everyone is photographed for public consumption.

reply

upvote

by spongebobstoes5 hours ago|

[-]

the big labs have already been doing this for at least a year

reply

upvote

by kcb12 minutes ago|

[-]

Yes, all the closed providers are probably doing this already. As well as open models like Gemma and Nemotron.

reply

upvote

by smcleod6 hours ago|

[-]

Qwen as well.

reply

upvote

by kamranjon6 hours ago|

[-]

There was a recent exodus from Qwen of researchers who supported their open source efforts, I’m not sure we will see many new open models from them past the 3.6 series.

reply

upvote

by markasoftware25 minutes ago|

[-]

are you able to say more about this (specifically that the researchers who left were concerned that qwen models were being closed sourced)? I was under the impression that the chinese labs and their employees aren't particularly concerned about ideology/safety/whatever and were just releasing open source models because it helps with publicity and does the most damage to the US AI labs. (and to be clear, I strongly support open source models, I just doubt that the Chinese labs and their employees are actually motivated by morals).

reply

upvote

by kamranjon10 minutes ago|

[-]

This was the easiest link I could find but TechCrunch also has an article on the departure - this one didn’t require turning off ad blockers though and has a bit more rumor mill stuff: https://chinabizinsider.com/alibabas-qwen-faces-turmoil-as-t...

reply

upvote

by 4 hours ago|

[-]

deleted

reply

upvote

by jimmydoe5 hours ago|

[-]

Besides the founder, the only real external investor for DeepSeek is Chinese govt. there are literally zero revenue pressure compare to O, A & G.

To compete in that direction, USG needs to learn from CCP to "seize the means of production", which they are sort of doing, but in such an incompetent way that I'm afraid we will probably end up mixing the worst of both communism and capitalism.

reply

upvote

by otterley2 hours ago|

[-]

> To compete in that direction, USG needs to learn from CCP to "seize the means of production"

No they don't. The U.S. Government is free to launch their own AI labs if they wish -- and even compete with the private sector -- but that doesn't mean they have to confiscate existing investments and capital. But Congress is unlikely to do that, because we've learned in the course of history that in well-functioning competitive markets, publicly-operated services tend to be worse than private ones across multiple dimensions.

Chinese companies are largely where they are not because they're state funded, but because they operate in ways that would be considered criminal in the U.S. If they didn't constantly trespass on OpenAI and Anthropic to try to achieve product and technological parity, they would be too far behind to produce innovative research.

reply

upvote

by altcognito5 hours ago|

[-]

China is just taking a lot of ideas from the USG when it was doing things correctly and is using those for innovation.

In this case, it feels like they are just funding multiple independent pure research projects and letting the chips fall where they may.

Doesn't even really seem like Europe can coordinate that.

reply

upvote

by piterrro9 hours ago|

[-]

I’ve been using DeepSeek v4 pro for a month now in Kilo Code and its great. Fast, reliable, large context window and cheap as… Did 1,5B tokens this month and cost me 40usd (majority cached, but still).

reply

upvote

by fer4 hours ago|

[-]

Which provider? I went through 40 bucks on it on openrouter. It was not a lot of back and forth, context ended at around 300k, 15kloc output. I was using opencode, unsure if I can make the total token count visible.

reply

upvote

by peheje3 hours ago|

[-]

OpenRouter sometimes chooses a very expensive provider. Try the floor slug or choose directly the provider. I moved to just putting 5 dollars directly on deepseek instead of going through OR.

reply

upvote

by apitman4 hours ago|

[-]

Have you compared Kilo to Pi or OpenCode? Those are the two I'm most familiar with but always looking for alternatives.

reply

upvote

by richardlblair6 hours ago|

[-]

I've been using omp with deepseek as my task and quicktask agents, and sonnet as everything else.

It's drastically reduced my AI spend. I went from spending $40/day to $10/day.

reply

upvote

by spiderfarmer9 hours ago|

[-]

Is there a way to see how many tokes one does with claude code (pro)?

reply

upvote

by bpavuk8 hours ago|

[-]

the casino has no clocks, as one HN user put it some time ago.

I second ccusage, it's nice

reply

upvote

by cptchaos8 hours ago|

[-]

https://ccusage.com/

reply

upvote

by edg50008 hours ago|

[-]

It's in the JSONs in ~/.claude, but last 30 days only I think. You can have the model analyze history. So for correct history you'd need to run history analysis on a cron job or something. Kinda hacky.

reply

upvote

by Stagnant2 hours ago|

[-]

The 30 day limit can be overridden by adding "cleanupPeriodDays": 9999 to .claude/settings.json

reply

upvote

by Havoc9 hours ago|

[-]

Nice.

Guessing the timing isn't accidental. Demonstrated openness vs harsh regulation

reply

upvote

by cr125rider5 hours ago|

[-]

China = Open. US = Harsh Regulation

Strange timeline, though this only works because it’s aligned with Xi’s goals.

reply

upvote

by Havoc5 hours ago|

[-]

Yeah can definitely see a world where china pivots and we're stuck with closed/closed

Mistral...don't fumble this

reply

upvote

by declan_roberts5 hours ago|

[-]

Nobody forced anthropic to go on a media blitz loudly proclaiming the dangers their new AI model. Serves them right honestly.

reply

upvote

by xnx4 hours ago|

[-]

Is this newer/better than the speculative decoding from 2022? https://arxiv.org/abs/2211.17192

reply

upvote

by alok-g54 minutes ago|

[-]

That paper is cited in the 'introduction' and 'background' sections. This paper is improving by removing some bottlenecks.

reply

upvote

by tiahura1 hours ago|

[-]

Seems like they focus on improving the drafter and the verification policy so speculation keeps producing net speedups rather than wasted verification work at deepseek scale.

reply

upvote

by articlepan5 hours ago|

[-]

Title is bad, it's the first line of the abstract instead of the paper title. Speculative decoding for LLM inference was published in 2022: https://arxiv.org/abs/2211.17192

This paper seems to be an improvement to speculative decoding but I haven't read it yet.

reply

upvote

by ricardobeat9 hours ago|

[-]

Presumably this has been in production for a while, and is one of the reasons they were able to dramatically lower prices a month ago?

reply

upvote

by chronogram7 hours ago|

[-]

Yes. Section 5 talks about real-world deployment: 5.1: "The DSpark draft models are co-deployed with the preview versions of DeepSeek-V4-Flash and DeepSeek-V4-Pro"; 5.4: "MTP-1 represents the former production setup, having been superseded by DSpark two weeks following the DeepSeek-V4-preview release."

reply

upvote

by _0ffh8 hours ago|

[-]

Lookahead Sparse Attention should be playing a big role as well, as it dramatically slashes memory consumption.

reply

upvote

by Jackobrien9 hours ago|

[-]

I see a world soon where there’s an extremely wide variety of small models for speculative decoding, unique to use cases, companies, and even individuals.

reply

upvote

by nicce9 hours ago|

[-]

Hopefully that is the case and hardware does not get impossible to get.

reply

upvote

by pydry8 hours ago|

[-]

yes, heavily constrained by sophisticated guardrails.

this is definitely where things are going. the enormous "eat the world" models have extreme diminishing returns by comparison.

reply

upvote

by Der_Einzige4 hours ago|

[-]

You clearly didn't read the recent speculative decoding papers because it's been possible to use any model to speculate for any other model for awhile. They solved the tokenization problems that prevented this in the past.

reply

upvote

by pokot08 hours ago|

[-]

I am wondering if this is why they can offer their pro model at ~1/4th of the price compared to the other providers offering the same model, and if other providers will be able to do the same in a short timeframe.

reply

upvote

by sfifs6 hours ago|

[-]

Inference I estimate runs 90% plus gross margins. Just work out the math on these servers. I am pretty sure any player can price down. It wouldn't look good on an IPO prospectus.

reply

upvote

by sschueller8 hours ago|

[-]

I have been heavily using DeepSeek V4 Pro at Max for a month now and I would say it is 100x cheaper. If I pay for Claude I will hit that limit so fast I am always waiting 5 hours. Using the frontier models at Kilo I go through dollars while doing the same thing via DeepSeek it is pennies.

reply

upvote

by ddxv7 hours ago|

[-]

I believe the comment you replied to was talking about the cost on providers like OpenCode vs Deepseek API. Deepseek API is even cheaper than the other providers for the same deepseek models.

reply

upvote

by vidarh8 hours ago|

[-]

It'd presumably help a lot, but also when you use their endpoint they get more training data.

reply

upvote

by nicce8 hours ago|

[-]

This applies to every provider. OpenAI seems to be the worst hoarder.

reply

upvote

by pokot08 hours ago|

[-]

actually you can buy inference on third party providers that serve deepseek v4 pro with zero data retention (ZDR).

reply

upvote

by nicce7 hours ago|

[-]

Only reliable way to have zero data retention is to self-host.

reply

upvote

by LeBit7 hours ago|

[-]

True. But at some point you got to close your eyes and take a step forward.

It’s like with VPN providers. Is Mullvad actually collaborating with law enforcement? They very well could be. It is a calculated risk.

Is DeepInfra actually logging and training or selling the logs? They could be.

reply

upvote

by flipped7 hours ago|

[-]

Mullvad has proved it doesn't collect. It's laughable to even suggest it.

They have been raided multiple times, tons of audits, does bleeding edge research on privacy preserving tech, donates to GOS, etc etc. You don't see this kind of VPN company at all because none exists.

reply

upvote

by nicce6 hours ago|

[-]

With Mullvad the threat space is also different. Most of the data is end-to-end encrypted anyway with proven methods. With LLMs you can't do that yet.

reply

upvote

by CharlesW1 hours ago|

[-]

How does one find trustworthy ZDR DeepSeek 4 Pro providers?

reply

upvote

by epolanski8 hours ago|

[-]

US labs do it too.

reply

upvote

by minraws7 hours ago|

[-]

Name any 2 or 3 that published bleeding edge research and similar in the last 6 months.

Well I can't think of even one at the moment, to be honest might be biased but all Chinese research labs are largely oss except Alibaba now.

I am certain there are lots of American labs that claim to do it, but either they are marketting in hype since they aren't even close to the frontier or contrarily just don't make anything of significant value public/oss.

reply

upvote

by kcb11 minutes ago|

[-]

Google and Nvidia...

reply

upvote

by flipped7 hours ago|

[-]

US labs are the biggest data broker in the current history. They collect everything, dumb fuck.

reply

upvote

by segmondy2 hours ago|

[-]

As we can see again, this has nothing to do with distillation, yet for every gain Chinese labs make, the US labs will accuse them of theft. Yet they are constantly innovating.

reply

upvote

by lelanthran7 hours ago|

[-]

These companies providing tokens, whether SOTA or not, that want to IPO are so fucked as time goes on.

Can't sell their SOTA models, only slightly better than the open source models for the models they can sell, cost 20x to 50x for good models, a TAM that consists almost solely of developers, with no customer of theirs actually boasting increased profits as a result of AI...

I fear their time to IPO may have passed.

reply

upvote

by utopiah7 hours ago|

[-]

The question is even, was there EVER a time for an IPO?

If the business model requires hundreds of billions to get the required quality (R&D but also infrastructure to collect data and train, either purchased or rented to 3rd party) while "only" dozens of billions can be earned back (as costs still exist to earn, it's not free once models are trained), then maybe there NEVER was nor till be a good time for an IPO in a rational market.

reply

upvote

by notnullorvoid2 hours ago|

[-]

> in a rational market.

Unfortunately the market is often not rational in this way.

Hype within retail market means there are suckers willing to buy. Institutional market knows there are suckers when the hype is high. Both would drive the price up, and retail investors the ones left when it falls.

reply

upvote

by 28383838386 hours ago|

[-]

IPOs with massive bags can be wework or spacex, it all depends on vibes. If they buy a couple more articles doomposting and glazing AI on the financial times right before exit they will def find a bunch of boomers to buy their bags. If the narrative changes before they IPO its over.

reply

upvote

by 7 hours ago|

[-]

deleted

reply

upvote

by danielabinav1608 hours ago|

[-]

Would love to see these numbers reproduced on consumer GPUs, not just A100s.

reply

upvote

by wolttam5 hours ago|

[-]

This is an efficiency improvement that significantly lowers the amount of RAM you have to look at, on average, during decode.

It should improve performance on most hardware because most LLMs are memory bandwidth bound during decode.

reply

upvote

by tommica8 hours ago|

[-]

Maybe somaday an 8gb videocard can be used for coding...

reply

upvote

by romanusrome7 hours ago|

[-]

[dead]

reply

upvote

by zftnb6662 hours ago|

[-]

AI making AI faster. Next up: AI writing papers about how AI makes AI faster

reply

upvote

by rvz8 hours ago|

[-]

This is just one of many papers DeepSeek have released to be able to serve models at extremely cheap prices, unlike the others taking on >$100B+ of debt in building data centers for the same thing.

> As with V4-Flash, we treat this point as an indication that DSpark sustains useful throughput under an interactivity target that the baseline cannot efficiently support. At matched system capacities, DSpark delivers 57% to 78% faster per-user generation.

Reminds me of the flawed solution in scaling servers in 2017 that use memory-intensive technologies by adding even more servers to solve the problem. (It just increases costs.)

Rather than doing that, think about which critical parts of your app can be written in a more performant technology.

Fast forward to 2026, now you can see who is just throwing more money at the problem to create even more problems where as DeepSeek is giving us optimized solutions.

I know exactly who I would pay attention to, and it is absolutely not Anthropic.

reply

upvote

by denverllc6 hours ago|

[-]

For so long American companies have operated under the assumption that servers are cheaper than developers, and that was used to justify all sorts of inefficient practices.

The last year has shown that’s not true anymore (even for web servers).

reply

upvote

by simianwords5 hours ago|

[-]

...... are you really suggesting OpenAI and Anthropic don't have access to these techniques?

reply

upvote

by wg04 hours ago|

[-]

That's why I pay them. Regularly. Without fail. Despite my token usage isn't that much.

But I vote for these heroes with my wallet. Just yesterday did again.

reply

upvote

by noIdeaTheSecond1 hours ago|

[-]

Cudos to you!If people realized how much power we had we's have a better world

reply

upvote

by bflesch7 hours ago|

[-]

At this point why can't someone produce a fridge or container-sized AI appliance based on legacy chips (12nm)? I imagine this would cover 80% of corporate use cases where you need to "google-in-a-box" functionality.

The state-of-the-art nanometer are impossible to achieve but if you have infinite solar energy during business hours does it really matter? Every company has a parking spot so this ASIC-like appliance could be as big as a shipping container.

If it could just run recent open models for a handful of users it would be such a nobrainer to buy.

reply

upvote

by scrlk7 hours ago|

[-]

See "exabox" from George Hotz: https://tinycorp.myshopify.com/products/exabox-preorder

reply

upvote

by flipped7 hours ago|

[-]

No one's buying that shitbox.

reply

upvote

by mrklol5 hours ago|

[-]

Why?

reply

upvote

by sixhobbits7 hours ago|

[-]

Nvidia is already selling exactly this I think, not sure when it's expected to ship

reply

upvote

by benjiro297 hours ago|

[-]

The issue is that there are only so many fabs in the world that make memory. And if you want the good stuff, your easily going into 400 ~ 750b parameter models. That means at FP4 400 to 750GB memory.

Did i mention there are only so many memory makers and they are all busy printing money with HBM memory?

Intel is trying with Crescent Island, to make a 160GB GPU that uses LPDDR5X memory.

HBM takes multiple times the resources to make vs basic DDR5 memory. So by going this route, you have more memory, with the disadvantage that its only 700GB/s. VS HBM pumping out Terrabyte numbers like its nothing.

These cards is reasonably priced, may be good alternative to $10k 96GB Nvidia Blackwells... You give up on token generation (heavily memory dependent), for more memory to run larger models at home/office/company servers.

The problem is, again, there are only so many memory makers and its not like the market is flooded with DDR5 memory anymore, as the big 3 moved a lot of production to HBM.

Another approach is Sandisk making HBF ... Flash memory, like your typical NVME but designed around maximum speed. So instead of loading the models into expensive HBM memory, you use the benefits of density in Flash memory, to offload models into that. Cheaper, but slower... But it leaves your expensive HBM memory free for things like KV Cache, Active parameters, etc... So your model will be slower, but your hybrid using it. As in, faster then running a model from system memory with normal DDR memory, but not as fast as HBM.

So yea, there is a lot in development to reduce the dependance of that resource eating HBM memory. For the wafer cost of 1GB HBM, you normally got 4GB normal memory. That is why the world supply of memory dropped. Not just the insane buying but be HBM is just very inefficient in wafer usage.

Can we not use DDR4 production and create some kind of hybrid solution? Sure, but the big 3 moved away from DDR4 in favor of DDR5 a long time ago. We have competition from China with a mix of DDR4/DDR5, but they also need to scale up. Nobody expected to see a large part of the world production vanish into HBM...

Even if its about DDR4 and older nodes, ironically, most companies had been moving away from DDR4. There is only so much wafer capability in the world, to the point that companies are moving to using DDR2 ... Yea, not a typo, like 2007 DDR2! for IOT devices etc, stuff that does not need fast memory. Because even DDR3 got too expensive for them.

Its not like the old nodes are not used anymore ... Like that capacity was sitting idle. It was still in production making other stuff. The only real solution is that we need more fabs, and those take years to build. And the big 3 delayed investing in new fabs for a long time, unsure about the whole AI bubble stuff. Aka, they did not want to make a ton of fabs to end up with over capacity if the AI growth collapsed.

reply

upvote

by bradfa6 hours ago|

[-]

With MoE models like Deepseek’s and with multiple Crescent Island accelerators, the aggregate memory throughput actually doesn’t look that bad. Two Crescent Island gets roughly 1400GB/s and Deepseek-v4-flash with 13B parameters active nets roughly 100t/s which is decent for a small team or great for a single user.

More Crescent Island scale up, although not likely entirely linearly.

But all GPU inference work like this, it’s not specific to Intel. Just Intel promises more affordable cards with big memory so they’re attractive.

reply

upvote

by lightedman4 hours ago|

[-]

Anyone want to bet that much like speculative execution, speculative decoding is going to introduce a whole slew of vulnerabilities in the ways LLMs work?

reply

upvote

by 28383838388 hours ago|

[-]

Must be wonderful to be on the board of OpenAi et al & their PE investors whilst China keeps blowing up these mines under their feet lmao. Luckily Korean pension funds will buy all the trash as usual but goddamn you gotta start moving quick or you are gonna need some serious AGI to show you how to offload those bonds

reply

upvote

by ForHackernews8 hours ago|

[-]

"We will build the machine-god and pray for it to pay for itself."

reply

upvote

by FridgeSeal8 hours ago|

[-]

Every day, the rate of “could post a picture of 40k tech priests and have it taken unironically” goes up, and it’s starting to get concerning.

reply

upvote

by ozgrakkurt8 hours ago|

[-]

Don’t worry they will sell all the hardware and data they acquired with their grift

reply

upvote

by preetham_rangu9 hours ago|

[-]

do they use their OCR, or someone else?

reply

upvote

by einrealist1 hours ago|

[-]

Yet another band aid.

reply

upvote

by myshapeprotocol1 hours ago|

[-]

[flagged]

reply

upvote

by eddysir3 hours ago|

[-]

[flagged]

reply

upvote

by swordlucky6665 hours ago|

[-]

[dead]

reply

upvote

by imrozim8 hours ago|

[-]

[dead]

reply

upvote

by playorizaya5 hours ago|

[-]

Meanwhile OpenAI is drafting an “open letter” to Congress /s

OpenAI and Anthropic are doing nothing interesting.

Basically forgot about them 2 years ago.

I don’t use DeepSeek either but at least they do interesting stuff - they were the first to do “thinking” iirc

reply