Google is still releasing a lot of llm architecture research. They introduced speculative decoding of LLMs in 2022[1], then released the code to perform sceculative decoding for their Gemma 4 model this year[2]
[1] https://arxiv.org/abs/2211.17192
[2] https://github.com/google-gemma/cookbook/blob/main/docs/mtp/...
Qwen 3.6 shipped with working MTP first, and had working MTP in llama.cpp first.
Ultimately though the real explanation, I think, is Google doesn't care since for their own purposes (in LiteRT-LM), they do bundle them. As far as I know, anyway.
Revealing optimizations similar to these would pretty much reduce their competitive position.
I suspect their tune will change if they ever take the lead..
Destroying the growth story of overvalued stocks is an interesting investment strategy. It's not even new. Shortsellers understandably get terrible rep from execs, but their actions are more often in the public interest than you'd think. Normally it's exposing fraud, but here we get the really fortunate side benefit of what could eventually amount to the most significant contribution to the general software community since Linux.
Its revealing that they always seem to publish after some big announcement by American AI companies. But regardless, this is one of the benefits of a duopoly.
Chinese AI companies also face a domestic market where open-source distribution is often the only way to reach enterprise clients who won't pay SaaS premiums. The business logic aligns with openness in a way that US labs' VC-funded models don't.
That used to be true, but now they've raised ~7B$, so we'll see how / if that changes.
Chinese companies understand this and they're treating models as shared infrastructure akin to Linux. The money is going to be in customization niches. Companies will charge to tune models for specific use cases and charge support for that. There's also going to be money at the bottom for hardware vendors making chips and memory. But the middle tier of generic LLMs is seeing involution where there's relentless competition driving profits towards the bottom.
Wikipedia is altruistic, and serves humanity quite well.
Contributing to it might not necessarily be. Most open source development is funded by large companies after all and from their perspective it can function as a cost saving measure. Allowing them to focus on their core products and removing the possibility of their rivals from getting a competitive advantage due to having a superior low level stack under their product.
Which is why open source is so successful in areas where software is a cost-center but mostly failed for consumer products (since spending resources on them would actually be altruistic unlike e.g. Linux kernel development)
any altruistic act can be perceived as self serving
Software engineers need money to survive. If they exclusively work on open source stuff where are they getting money from to survive? Follow the money trail… even a donation… eventually it leads to an incentive based source or action.
From open source. You can earn money from open source. Open source is not opposed to capitalism, idk where you got that idea.
[†] Another problem with altruism: we don't all agree on whether a goal is altruistic, and what's altruistic in the enactor's eyes might not be in yours. Curating a fountain of human knowledge like Wikipedia? Probably altruistic. Protecting humanity from itself by installing your company as the stewards of frontier LLMs? Not so altruistic in my view.
The War on Drugs had the purpose (not just in its origin but in its perpetuation) of inflicting harm on elite-disfavored subsets of the population that could not be openly targeted for Constitutional reasons, which is about as far from an altruistic reason as it possible to get.
Any individual that provides free labor cannot survive off of said free labor. He must work for money to survive or get donations from someone who earned that money from incentive based labor in order to even buy the food he needs to exist as a living human being. Much of the time that labor is actually closed source.
This is a logistical reality. A lot of open source advocates are unable to get their brains out of the whole mentality that open source literally cannot exist without incentive based software supporting it. Who pays for GitHub to exist? Who pays for the food swes eat? I just code for open source all day and money falls out of the sky.
My smart friend says there are jobs that pay you to work on open source exclusively. Smart guy. In this case you follow the money trail. How does that company get enough money to pay a guy to work exclusively on open source?
Put it another away: if we removed your ability to do incentivized labor and all you can do is charity work… you would run out of money and die from starvation. If we did the opposite and we removed your ability to do charity work… you’d be fine.
All of this re-emphasizes the point of this thread: In our objective reality, the world is driven by incentive based work while altruism is a side effect of surplus wealth generated by incentive based work. That is the fundamental reality.
I don't see an inconsistency. money is pragmatic, the mission needs money
The real mission statement for most companies is to make as much money as possible.
Markets don't run on altruism.
Meanwhile we in the US are blocked from buying Huawei GPUs and retirees are boasting about the nvidia in their portfolios.
US labs in Google, Meta and SpaceX are not leading, none of them managed to build something on par with GLM 5.2.
Care to explain to me why they still don't collaborate and still choose to do it in private?
From a practical POV having all the training data, training infrastructure, and training know-how wouldn't help you either unless you could afford to spend the millions of dollars (hundreds of millions for a SOTA model) in compute to train it each time they released a new training set, in which case you're only talking about the big commercial companies. "open source for the people" just does not apply.
Those are mostly for embedded devices and the current "sponsor" is Apple.
Even if they're ahead they don't have enough GPUs to scale. Open sourcing is hence a good strategy to at least get market share (even if not $).
I say this because we see the same thing used as an argument against China. "If they overtake us, they'll do imperialism (like us)." Again, it says more about us than them.
A better reading (IMHO) Of the situation is that China believes that AI shouldn't be used simply to mint a few more trillionaires but the benefits should be shared with society. Why do I say this? Because we now have 70+ years of China doing exactly that. The transformation in China all the way from rural villages to Tier 1 cities has been utterly astounding. China has lifted ~800M people out of extreme poverty.
In some ways we're at a similar point to the late 1990s and 2000s when Microsoft execs complained that Linux, being free, destroyed intellectual property value. Linux should be a perfect example of how people can and do act altruistically, or at least not in a way to bait-and-switch to enrich themselves.
[1]: https://www.reddit.com/r/AskHistory/comments/1d26grm/in_the_...
Meanwhile, Xi Jinping has published his 5th book on how governance in China works and what they're after. These are not books written for a western audience: they're compilations of speeches that he already gave to the Chinese party and state apparatus, so the contents are not sanitized for foreign audiences. But there are no English reviews of summaries of this 5th book at all by the usual China experts that distribute what western audience know about China.
This extends to beyond the government. Even though "for the people but only against the government" is an often-heard mantra, nobody seems to listen to what Chinese AI companies themselves say about why they publish open models. DeepSeek and GLM have said multiple times publicly what their motivations are, yet people on HN still speculate like they usually do.
Truly mind-boggling. I get that a lot of people don't like China. But setting aside the question of whether their dislike is justified, it would at least be rational to properly understand China, even if it's to defeat it. And listening to what China says themselves is absolutely essential for proper understanding. But people don't bother to? And they seem mostly happy with sticking to speculations that match preconceived notions, even if that hurts their chances of defeating China.
For something shorter, you can see Arnaud Bertrand's recent review. https://arnaudbertrand.substack.com/p/the-book-the-west-refu... The review is behind a paywall, but not expensive.
If you want to read policy documents directly (primary source), try the State Council / Chinese government policy database: https://www.gov.cn/zhengce/ and https://sousuo.www.gov.cn/zcwjk/policyDocumentLibrary
They also provide official translations: https://english.www.gov.cn/policies/
For Central Party documents: https://news.cn/politics/zywj/. It lists recent Central Committee / General Office / joint Party-State documents, e.g. 2026 documents on township duty lists, Party member development rules, carbon evaluation, long-term care insurance, and SOE leadership rules.
If you simply take what the Chinese government says at face value, you will be correct way more often than 95% of Western policy wonks, media talking heads, "analysts" and so forth. Because, like you say, they tell you everything they're doing.
In the recent US-China summit, Xi Jinping just came out and used the Thucydides Trap metaphor, which tells you everything about where China thinks it is and where it sees the US going, which is to become increasingly belligerent as their power declines. Now whether or not you agree with that assessment (I do agree), it still tells you China wants to avoid open hostilities, it sees itself as continuing to rise and it fears what a declining US might do.
But western politicians keep raising this metaphor. So at some point they're like "okay we'll speak your language". They then used this metaphor to make the point "our rise isn't the threat, your fear of it is. If you resist it you're walking right into the trap Thucydides warned about". So your conclusion is still right, they don't want open hostilities, a stable world is in their interest.
Then western media ran away with this and were like "OMG Xi mentioned the Thucydides Trap", completely ignoring his point.
1) The CEO himself 2) Tencent 3) CALT (the battery company) 4) NetEase (internet/media company) 5) JD.com (ecommerce) 6) Chinese investment firms
What are they expecting in return? I'd say the same thing that all those investors in OpenAI and Anthropic are expecting - profit.
[0] https://finance.sina.com.cn/stock/vcpe/2026-06-11/doc-iniazi...
https://www.oecd.org/en/data/dashboards/magic-database-indus...
And regarding the dataset:
> Unlike most OECD databases, which rely on government data provided at country-level, the OECD MAGIC database uses firm-level data. The subsidy estimates included in the database are based on raw data obtained from firms’ annual reports, financial statements, bond prospectuses, IPO prospectuses, etc. The data are collected and verified manually by the OECD to maximise accuracy, consistency, and comparability. In some cases, additional information is also obtained from government databases, either to verify the firm-level information or to complement it. Care is taken to avoid double-counting where the data mix corporate and government sources.
Which will likely help them bolster the sales of the MANY new AI chips in development/use in China to international markets. Dislodging Nvidia.
Kinda the opposite of what Jensen Huang (Nvidia) thinks US is doing: https://www.youtube.com/shorts/u3SY8nvjhQA
Edit: I'm a fan of deepseek and believe it's good to make the technology open/available. And do think that also help business - which I support as well.
Edit 2: No idea why I'm getting downvoted. That's also their official stance https://english.www.gov.cn/news/202601/08/content_WS695f1b55...
???
Profit!
Not suggesting this is it, but you know, one possible angle.
Is there anywhere public anymore that isn’t being overrun by lobotomized p-zombies (partisan zombies)? Is it even possible to make such a public space? Ressentiment consumes all discourse.
That's a lot of words to say it's just capitalist greed.
What's with all the China glazing about this stuff? They release some open-source work and people act like they are suddenly the beacon of freedom and transparency.
Hopefully the experts here can offer insight. The above is just my hunch and I’m not a specialist in this field.
So, despite hiring the cream of the crop of math graduates, who could read the papers of free academia, but whose own result the free world could not access - they fell behind.
I have a theory explaining why. I think it's because science is an interactive process. NSA cryptographers could read papers, but they couldn't talk openly with the authors of those papers, because of secrecy demands - even asking question might indicate what they were working on. You can easily imagine them spending months on something they could have avoided by going to the original authors and getting told "Oh, we tried that for a long time, it doesn't work".
Whether that theory is right or not, cryptography is a concrete example of a domain where public research with fewer resources beat private research with a lot more resources.
The American companies, from my impression don’t involve themselves with such lowly “hacks” because they have so much money to just push forward with doing everything on big heavy models that run on the most cutting edge nvidia chips that they can, the moment, kinda sorta get on demand (I say that in some degree of jest).
They don't develop them because they don't collaborate publicly anymore.
Where would the whole industry be if Google never allowed publishing the transformers paper?
It's not a coincidence that the American AI industry grew fastest in capability when it was the most open.
It's more a cultural thing. Sharing progress is just in their blood.
Multi-head Latent Attention (MLA), Multi-Token prediction, MoE architecture are some of the most famous examples.
MTP is from Meta
Another DeepSeek advance that the west are copying is DeepSeek Sparse Attention (DSA)
[1] R. A. Jacobs, M. I. Jordan, S. J. Nowlan, G. E. Hinton, Adaptive mixtures of local experts. (1991)
[2] M. I. Jordan, R. A. Jacobs, Hierarchical mixtures of experts and the EM algorithm. (1993)
[3] L. Xu, M. Jordan, G. E. Hinton, An alternative model for mixtures of experts. (1994)
[4] S. Waterhouse, D. MacKay, A. Robinson, Bayesian methods for mixtures of experts. (1995)
[5] N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. Le, G. Hinton, J. Dean, Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. (2017)
https://www.cnbc.com/2026/06/24/anthropic-alibaba-distillati...
So many Americans seem to (at least in theory) be ready to sign up for this ongoing confrontation with China. Does anyone think it isn't America who is poking the bear when it comes to the Thucydides trap? Why not try to get along? It occurs to me the only people more Chinese innovation would hurt are the mega cap class in the United States. Elon Musk certainly doesn't want BYD in the United States. Same story all the way down with these super capitalized AI companies. Most average Americans would probably be better off in a world where the United States and China got along. But its those Americans who will be called upon to suffer most of the burden if that trap ever springs.
Why not talk about how China shut out American companies for decades before complaining about BYD?
As an Indian immigrant, the PRC China has engaged in conflict with almost all its neighbors and stated wars in its short history.
China is not so benevolent when they get to the #1 spot:
https://m.economictimes.com/industry/renewables/china-wto-co...
As for the rest of it:
They don't have TPUs or access to the latest Vera Rubin GPUs either to get performance gains for free. All of the optimizations Deepseek have done are in software and it goes down to the PTX assembly level.
Compared to Anthropic who are celebrating in fixing a flickering issue in a terminal app which took months to fix.
DeepSeek are still using NVIDIA (PTX) to train on, but for inference have already transitioned to Huawei Ascend chips, and inference speed is what this paper is addressing.
It's funny, because if you ran Claude Code on a slow terminal, the cause of the flicker was obvious: They kept dumping the entire history of the chat back into the terminal in a number of situations, and relied on the terminal to them end up in the correct state.
More likely is that an AI generated codename is impossible to fix by humans, and SOTA was not able to figure it out until now.
We in the United States will never forget!
For all the harm Trump does to the US at least he is helping China!
What became clear when DeepSeek came onto the scene was that China was seeking to commoditize LLMs. They consider it an issue of national security not to be beholden to US tech companies when it comes to AI. And I, for one, fully endorse this policy.
Another data point on this is the black market for Claude tokens in China [1]. The chat logs themselves are a commodity to train models.
I believe that OpenAI in particular is a bet on a trillion dollar pot of gold that doesn't exist. Google, Microsoft, Amazon and Meta will all be fine. Anthropic is in a far better position than OpenAI (IMHO) but if DeepSeek or some other Chinese open weight model gets as good at coding, they're in real trouble too.
There is a meteor headed towards all this AI investment that I don't think has been properly accounted for and that is, what happens to all the existing hardware investments when NVidia's next architecture comes out. Blackwell (H100/H200) is the current generation. Rubin (R100, presumably R200) is the next and arrives soon. Now a lot of the investment hasn't been spent yet so will likely be spent on Rubin but at that point, what happens when the next iteration comes out and does 3-4x the compute for the same electricity input and same hardware cost?
Also, what happens when people can run way bigger models on consumer hardware in 5 years? The effective limit for useful local LLMs is currently ~31B parameter models because the RTX 5090 has 32GB of VRAM and Apple's shared memory architecture, which can keep bigger models in memory, just doesn't have the raw processing power.
Anyway, why I argue Anthropic is in a better position (than OpenAI) is that they seem to have captured a market that may well be profitable for them as a company, specifically Claude for coding. So they just haven't burnt quite as much cash as OpenAI so aren't in as deep of a hole.
While I think local models are going to improve maassively over the next few years, running them in a data center at scale is always going to be cheaper for a company. Why? Because they can amortize their costs by running 24/7 and powering them and cooling them is simply cheaper at scale when you're talking about 1000+ engineers who otherwise might only be using their hardware ~40 hours a week.
IMHO Google is in the best position here of all the US companies, even though their models aren't the best, because their data centers are ruthlessly efficient, their homegrown TPUs will eventually catch up (and thus avoid the NVidia tax) and they simply haven't bet the farm on winning AI.
However, Google probably won't catch up. Nvidia has been winning in spite of the fact that their hardware is general purpose rather than tuned for inference.
Rubin has architectural differences I don't understand that are supposed to make inference much cheaper and faster while still retaining those other more generic capabilities. Their next generation after that is going to do even better at being fast for inference and general purpose.
Google is betting that their TPUs won't depreciate faster than the markup they have to pay to Nvidia. I don't think they will be right.
A100s are ~7 years old and going for more than 2 dollars an hour, significantly more expensive than even 2 years ago. This is because anything with 80gb of VRAM or more and made by Nvidia will have economically useful lifespans of like, 10 years.
I could see H100s getting 12 years.
Micheal Berry doesn't know shit about GPUs.
Now jump ahead 2 years and you seem to have a massive jump in performance [1]. The tokens/Watt goes up by at least 2 orders of magnitude. And the B100 is 3-4x that. And we're about to hit the R100 (Rubin) cliff.
That's what this is going to come down. When hyperscalar DCs are getting to Gigawatt power usage, it all comes down to power efficiency. Those A100s aren't far from being sold for scrap.
I've been looking into how different companies are handling depreciation for this. Amazon seems to be saying the life is 3-4 years, Google 4-5 and Meta is saying 8+, which I think is wildly optimistic.
[1]: https://lambda.ai/inference-models/deepseek-ai/deepseek-v4-f...
anyone with IQ higher than 130 (thus qualified for actual AI R&D) would be questioning something obvious here -
if they are already doing such dodgy stuff with the aim to maximize profits, why would those resellers have large amount of logs with actual American model responses to sell to those AI labs in the first place. shouldn't they just post train & customize some leading Chinese open source models to pretend to be Opus or GPT for the vast majority of their users (as classified by some models) who don't know much about expected Opus behaviours & not skilled enough to tell the differences?
that is actually the interesting bit not covered in your censored version of the story line, it is also what happens on the ground. your censored version of the story implies that those dodgy resellers using stolen credit cards, pooling accounts with stolen IDs and illegally selling very personal logs would somehow be honest enough to spend extra $ to ensure their victims (aka paying users) can actually use real Opus and GPT. LOL
dude, you failed this IQ test miserably.
https://yipzap.com/anthropic-accuses-alibaba-of-largest-ai-d...
Don't even try to combine it with any notion of "leadership" then, however, since distillation is literally "copying the actual leader"
(and if you argue the US models do produce copyrighted works, then oooops - whose copyright is it huh?)
There's no "leader" if, absent someone whose results you're copying, you are an emperor without clothes
And certainly they have no idea whether these outputs (assuming they ever existed and it wasn't made up) were used for training. The article mentions that DS made 150k requests. This isn't much and might have been just an eval or a benchmark to compare their own model against. It's really hard to believe DeepSeek had any Claude outputs anywhere in their training schedule, since it's just too different. Besides training on random vibecode of course, which is mostly written by Claude.
Imagine if your casio calculator would come with a ToS that says you can't use it to develop a competitor calculator or any other tools. Or that your hammer can't be used to make other tools. Or, closer to the HN crowd, imagine MS in the 90s saying that you can't use their OS to build competing services to MS. They'd be laughed at and be split immediately if they tried that.
The only thing they can do is to refuse serving tokens (and even that's debatable, if we get to tokens being commoditised). But that's gonna be a game of whack-a-mole, and they know it.
Flash: https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash-DSpark
Pro: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro-DSpark
Excited to see if this makes it into DwarfStar for local inference, have been using the flash model extensively since the 2-bit quants were made available by antirez.
Especially since your 5-day-old account is sus, and thus likely not yet proven not to be a Chinese bot
You can't lead by following the actual leader LOL
The only real innovation I've seen from Deepseek is the out-loud reasoning thing in R1
I'd also include the other Chinese labs like Moonshot (behind Kimi) and Z.ai (behind GLM). They are innovating and continue openly sharing their research to the public. I believe the founder of Moonshot even shared 40 minute video on Twitter where he goes through techniques that powers Kimi.
The strategy for the most companies in the US has been for a long time to capture the social audience, whatever the mean is. Quality and innovation is the second factor. Capture the market, lock in the users, influence regulation and lobbying to keep the power.
They compete with each other by innovating. The innovations result in more utility for the customer, but the technology isn't made public. Trade secrets are secret for a reason.
The reason people may think that DeepSeek is the "most innovative" is because of what they can observe from the outside, much like people may mistakenly conclude models are the "prettiest of the population" because not everyone is photographed for public consumption.
To compete in that direction, USG needs to learn from CCP to "seize the means of production", which they are sort of doing, but in such an incompetent way that I'm afraid we will probably end up mixing the worst of both communism and capitalism.
No they don't. The U.S. Government is free to launch their own AI labs if they wish -- and even compete with the private sector -- but that doesn't mean they have to confiscate existing investments and capital. But Congress is unlikely to do that, because we've learned in the course of history that in well-functioning competitive markets, publicly-operated services tend to be worse than private ones across multiple dimensions.
Chinese companies are largely where they are not because they're state funded, but because they operate in ways that would be considered criminal in the U.S. If they didn't constantly trespass on OpenAI and Anthropic to try to achieve product and technological parity, they would be too far behind to produce innovative research.
In this case, it feels like they are just funding multiple independent pure research projects and letting the chips fall where they may.
Doesn't even really seem like Europe can coordinate that.
It's drastically reduced my AI spend. I went from spending $40/day to $10/day.
I second ccusage, it's nice
Guessing the timing isn't accidental. Demonstrated openness vs harsh regulation
Strange timeline, though this only works because it’s aligned with Xi’s goals.
Mistral...don't fumble this
This paper seems to be an improvement to speculative decoding but I haven't read it yet.
this is definitely where things are going. the enormous "eat the world" models have extreme diminishing returns by comparison.
It’s like with VPN providers. Is Mullvad actually collaborating with law enforcement? They very well could be. It is a calculated risk.
Is DeepInfra actually logging and training or selling the logs? They could be.
They have been raided multiple times, tons of audits, does bleeding edge research on privacy preserving tech, donates to GOS, etc etc. You don't see this kind of VPN company at all because none exists.
Well I can't think of even one at the moment, to be honest might be biased but all Chinese research labs are largely oss except Alibaba now.
I am certain there are lots of American labs that claim to do it, but either they are marketting in hype since they aren't even close to the frontier or contrarily just don't make anything of significant value public/oss.
Can't sell their SOTA models, only slightly better than the open source models for the models they can sell, cost 20x to 50x for good models, a TAM that consists almost solely of developers, with no customer of theirs actually boasting increased profits as a result of AI...
I fear their time to IPO may have passed.
If the business model requires hundreds of billions to get the required quality (R&D but also infrastructure to collect data and train, either purchased or rented to 3rd party) while "only" dozens of billions can be earned back (as costs still exist to earn, it's not free once models are trained), then maybe there NEVER was nor till be a good time for an IPO in a rational market.
Unfortunately the market is often not rational in this way.
Hype within retail market means there are suckers willing to buy. Institutional market knows there are suckers when the hype is high. Both would drive the price up, and retail investors the ones left when it falls.
It should improve performance on most hardware because most LLMs are memory bandwidth bound during decode.
> As with V4-Flash, we treat this point as an indication that DSpark sustains useful throughput under an interactivity target that the baseline cannot efficiently support. At matched system capacities, DSpark delivers 57% to 78% faster per-user generation.
Reminds me of the flawed solution in scaling servers in 2017 that use memory-intensive technologies by adding even more servers to solve the problem. (It just increases costs.)
Rather than doing that, think about which critical parts of your app can be written in a more performant technology.
Fast forward to 2026, now you can see who is just throwing more money at the problem to create even more problems where as DeepSeek is giving us optimized solutions.
I know exactly who I would pay attention to, and it is absolutely not Anthropic.
The last year has shown that’s not true anymore (even for web servers).
But I vote for these heroes with my wallet. Just yesterday did again.
The state-of-the-art nanometer are impossible to achieve but if you have infinite solar energy during business hours does it really matter? Every company has a parking spot so this ASIC-like appliance could be as big as a shipping container.
If it could just run recent open models for a handful of users it would be such a nobrainer to buy.
Did i mention there are only so many memory makers and they are all busy printing money with HBM memory?
Intel is trying with Crescent Island, to make a 160GB GPU that uses LPDDR5X memory.
HBM takes multiple times the resources to make vs basic DDR5 memory. So by going this route, you have more memory, with the disadvantage that its only 700GB/s. VS HBM pumping out Terrabyte numbers like its nothing.
These cards is reasonably priced, may be good alternative to $10k 96GB Nvidia Blackwells... You give up on token generation (heavily memory dependent), for more memory to run larger models at home/office/company servers.
The problem is, again, there are only so many memory makers and its not like the market is flooded with DDR5 memory anymore, as the big 3 moved a lot of production to HBM.
Another approach is Sandisk making HBF ... Flash memory, like your typical NVME but designed around maximum speed. So instead of loading the models into expensive HBM memory, you use the benefits of density in Flash memory, to offload models into that. Cheaper, but slower... But it leaves your expensive HBM memory free for things like KV Cache, Active parameters, etc... So your model will be slower, but your hybrid using it. As in, faster then running a model from system memory with normal DDR memory, but not as fast as HBM.
So yea, there is a lot in development to reduce the dependance of that resource eating HBM memory. For the wafer cost of 1GB HBM, you normally got 4GB normal memory. That is why the world supply of memory dropped. Not just the insane buying but be HBM is just very inefficient in wafer usage.
Can we not use DDR4 production and create some kind of hybrid solution? Sure, but the big 3 moved away from DDR4 in favor of DDR5 a long time ago. We have competition from China with a mix of DDR4/DDR5, but they also need to scale up. Nobody expected to see a large part of the world production vanish into HBM...
Even if its about DDR4 and older nodes, ironically, most companies had been moving away from DDR4. There is only so much wafer capability in the world, to the point that companies are moving to using DDR2 ... Yea, not a typo, like 2007 DDR2! for IOT devices etc, stuff that does not need fast memory. Because even DDR3 got too expensive for them.
Its not like the old nodes are not used anymore ... Like that capacity was sitting idle. It was still in production making other stuff. The only real solution is that we need more fabs, and those take years to build. And the big 3 delayed investing in new fabs for a long time, unsure about the whole AI bubble stuff. Aka, they did not want to make a ton of fabs to end up with over capacity if the AI growth collapsed.
More Crescent Island scale up, although not likely entirely linearly.
But all GPU inference work like this, it’s not specific to Intel. Just Intel promises more affordable cards with big memory so they’re attractive.
OpenAI and Anthropic are doing nothing interesting.
Basically forgot about them 2 years ago.
I don’t use DeepSeek either but at least they do interesting stuff - they were the first to do “thinking” iirc