TPUs are at least dogfooded by Google deepmind, no team AFAIK has gotten the AMD stack to train well.
Pull quotes:
AMD’s software experience is riddled with bugs rendering out of the box training with AMD is impossible. We were hopeful that AMD could emerge as a strong competitor to NVIDIA in training workloads, but, as of today, this is unfortunately not the case. The CUDA moat has yet to be crossed by AMD due to AMD’s weaker-than-expected software Quality Assurance (QA) culture and its challenging out of the box experience.
[snip]
> The only reason we have been able to get AMD performance within 75% of H100/H200 performance is because we have been supported by multiple teams at AMD in fixing numerous AMD software bugs. To get AMD to a usable state with somewhat reasonable performance, a giant ~60 command Dockerfile that builds dependencies from source, hand crafted by an AMD principal engineer, was specifically provided for us
[snip]
> AMD hipBLASLt/rocBLAS’s heuristic model picks the wrong algorithm for most shapes out of the box, which is why so much time-consuming tuning is required by the end user.
etc etc. The whole thing is worth reading.
I'm sure it has (and will continue to) improved since then. I hear good things about the Lemonade team (although I think that is mostly inference?)
But the NVidia stack has improved too.
Amazon's compensation strategy, in which you primarily get a raise years in the future for tricking your management chain into promoting you is definitely bearing its rotten fruit.
if they had this management attitude, they wouldn't have been so far behind so as to need this action in the first place!
> “Are we afraid of our competitors? No, we’re completely unafraid of our competitors,” said Taylor. “For the most part, because—in the case of Nvidia—they don’t appear to care that much about VR. And in the case of the dollars spent on R&D, they seem to be very happy doing stuff in the car industry, and long may that continue—good luck to them.
https://arstechnica.com/gadgets/2016/04/amd-focusing-on-vr-m...
"car industry" is linked to the GPU-accelerated self-driving car work, ie, making neural networks run fast on GPUs: https://arstechnica.com/gadgets/2016/01/nvidia-outs-pascal-g...
Maybe Amazon is an example how this happens even to hardware divisions within software/logistics companies
ROCm works great too, the only issue i have had is that my machine froze a couple of times as it used 100% of the graphics and the OS had nothing left. Since moving to vulcan i stopped getting these errors apart from a little UI slowdown when i had 4 models loaded at the same time taking turns.
Im also on a i7 6700 with 32gb DDR4 so im sure that is causing more slowdowns then the graphics card.
Anthropic did retire an interview take-home assignment involving optimising inference on exotic hardware, because Claude could one shot a solution, but that was clearly a whiteboard hypothetical instead of a real system with warts, issues and nuance.
But AMD does not want to pay these specialized SWEs the market rate. Their existing SWEs would be up in arms saying, basically, "what are we, chopped liver??", or so the thinking goes.
So AMD is stuck with a shitty software stack which cannot compete with CUDA.
If I were making such decisions, I would just cull the number of existing SWEs down by 50%, and double the pay for remaining ones. And then go out and hire some top talent to build a good software stack.
Freudian slip?
int8 quantization seems like it's almost supported, but not quite. speeds drop to a fraction of full precision speed and the server seems like it intermittently hangs. int4 quantization not supported. fp8 quantization not supported.
again, maybe AMD is just being lazy with what they've provided, but it's not a great look.
right now the fastest smart model i can run is full precision qwen3-32b. with 120 parallel requests (short context) i'm getting PP @ 4500 tokens/sec and TG @ 1300 tokens/sec
From the papers I've read and the labs that I have worked in personally, I would say that most scientists developing Deep learning solutions use CUDA for GPU acceleration
What's unclear to me is how much Google uses GPUs for their own stuff. Yes Gemini runs on GPUs now, so that Google can sell Gemini on-prem boxes (recent release announced last week), but is any training or inference for Gemini really happening on GPUs? This is unclear to me. I'd have guessed not given that I thought TPUs were much cheaper to operate, but maybe I'm wrong.
Caveat, I work at Google, but not on anything to do with this. I'm only going on what's in the press for this stuff.
Do you have any more information on this? I only found this article about it: https://venturebeat.com/technology/googles-gemini-can-now-ru...
It mentions that Gemini can run on eight NVIDIA GPUs, but not which GPU and which Gemini model. Either way, this puts an upper bound of 288 * 8 = 2304 GB on the size of the Gemini model, which as far as I know has been a secret until now.
To run a 8 bit quantized version of that you need roughly 5TB of RAM.
Today that is around 18 NVidia B300. That's around $900,000, without including the computers to run them in.
It's true that the capability of open source models is improving, but running actual frontier models on your MPB seems a way off.
[1] https://x.com/elonmusk/status/2042123561666855235?s=20 (and Elon has hired enough people out of those labs to have a fair idea)
Today's LLMs are able pack much more capabilities into fewer parameters compared to 2023. We might still be at the very rudimentary phase of this technology there are low-hanging efficiency gains to be had left and right. These models consume many orders of magnitude more energy than a human brain, this all seems like room for improvement.
The right question: is there a law in information theory that fundamentally prevents a 70B model of any architecture from being as smart as Opus 4.7?
Or so they say.
If it's true then that just shows how far behind the cloud providers are lagging while wasting investor money.
(There's a huge amount of diminishing returns in increasing parameter counts and the intelligent AI company should be hard at work figuring out the optimal count without overfitting.)
You could run it on a cluster of nodes that each do some mix of fetching parameters from disk and caching them in RAM. Use pipeline parallelism to minimize network bandwidth requirements given the huge size. Then time to first token may be a bit slow, but sustained inference should achieve enough throughput for a single user. That's a costly setup of course, but it doesn't cost $900k.
Not sure this is a MBP either.
In mid-2028 we have N2E/N2P with around 15% greater transistor density than today's N3P, and by EOY2028 we'll likely have A14 with about 35-40% density improvement.
Meanwhile, we'll be on LPDDR6 by that point, which takes M-series Pros from 307GB/s -> ~400GB/s, and Max's from 614GB/s -> ~800GB/s.
Model improvements obviously will help out, but on the raw hardware front these aren't in the ballpark for frontier model numbers. An H100 has 3TB/s memory bandwidth, fwiw
In practice unless you're doing some kind of deep research thing with the cloud, it'll try to optimize mostly for time and get you a good enough answer rather than spending an hour or two. An hour of cloud searching with huge data stores is not equivalent to an hour of local agentic searching, presumably.
I think that problem will improve a little in the coming years as we kind of create optimized data curation, but the information world will keep growing so the advantage will likely remain with centralized services as long as they offer their complete potential rather than a fraction.
Same with the CPU. Linux compiled faster on an M1 than on the fastest Intel i9 at the time, again using only 25% of the power budget.
And the M-series has only gotten better.
It is kind of sad Apple neglects helping developers optimize games for the M-series because iDevices and MacBooks could be the mobile gaming devices.
You're cooked if you actually believe this
For a Qwen 3.6 35B / 3B MoE, 4-bit quant:
- parsing a 4k prompt on a M4 Macbook Air takes 17 seconds before generating a single token.
- on an M4 Max Mac Studio it's faster at 2.3 seconds
- on an RTX 5090, it's 142ms.
RTX 5090 uses more power than an M4 Max Mac Studio but it's not 16x more power.
The thing that Apple has always been excellent at is efficiency - even during the Intel era, MacBooks outclassed their Windows peers. Same CPU, same RAM, same disks, so it definitely wasn't the hardware, it was the software, that allowed Apple to pull much more real-world performance out of the same clock cycles and power usage.
Windows itself, but especially third party drivers, are disastrous when it comes to code quality, and they are much much more generic (and thus inefficient) compared to Apple with its very small amount of different SKUs. Apple insisted on writing all drivers and IIRC even most of the firmware for embedded modules themselves to achieve that tight control... which was (in addition to the 2010-ish lead-free Soldergate) why they fired NVIDIA from making GPUs for Apple - NV didn't want to give Apple the specs any more to write drivers.
I think that's a valid demand, considering Nvidia's budding commitment to CUDA and other GPGPU paradigms. Apple, backing OpenCL, would have every reason to break Nvidia's code and ship half-baked drivers. They did it with AMD's GPUs later down the line, pretending like Vulkan couldn't be implemented so they could promote Metal.
Apple wouldn't have made GeForce more efficient with their own firmware, they would have installed a Sword of Damocles over Nvidia's head.
There are other workloads where the M1 actually beats the 3090.
Apple does plenty of hyping but it's always cute when irrational haters like you put them down. The M1 was (well, is) a marvel and absolutely smokes a 3090 in perf per watt.
Find or link these workloads you think exist, please
> The M1 was (well, is) a marvel and absolutely smokes a 3090 in perf per watt.
The GTX 1660 also smokes the 3090 in perf per watt. Being more efficient while being dramatically slower is not exactly an achievement, it's pretty typical power consumption scaling in fact. Perf per watt is only meaningful if you're also able to match the perf itself. That's what actually made the M1 CPU notable. M-series GPUs (not just the M1, but even the latest) haven't managed to match or even come close to the perf, so being more efficient is not really any different than, say, Nvidia, AMD, or Intel mobile GPU offerings. Nice for laptops, insignificant otherwise
The context of this thread isn't consumer chips, but Apple's analog to an H/B200.
> The GPU is monstrously good. Depending on the workload, the M1 series GPU using 120W could beat an RTX 3090 using 420W.
You're just listing the TDP max of both chips. If you limit a 3090 to 120W then it would still run laps around an M1 Max in several workloads despite being an 8nm GPU versus a 5nm one.
> It is kind of sad Apple neglects helping developers optimize games for the M-series
Apple directly advocated for ports like Death Stranding, Cyberpunk 2077 and Resident Evil internally. Advocacy and optimization are not the issue, Apple's obsession over reinventing the wheel with Metal is what puts the Steam Deck ahead.
Edit (response to matthewmacleod):
> Bold of them to reinvent something that hadn't been invented yet.
Vulkan was not the first open graphics API, as most Mac developers will happily inform you.
OpenGL had become too unmanagable which is why devs moved to DirectX.
Unless you meant a different one?
Surprised Apple didn't create a TPU-like architecture. Another misstep from John Gianneadrea.
Apple had the technology to scale down a GPGPU-focused architecture just like Nvidia did. They had the money to take that risk, and had the chip design chops to take a serious stab at it. On paper, they could have even extended it to iPhone-level edge silicon similar to what Nvidia did with the Jetson and Tegra SOCs.
(Like “I want to do object detection for cutting people into stickers on device without blowing a hole in the battery, make me a chip for that”.)
Bold of them to reinvent something that hadn't been invented yet.
Open AI has nothing. Their tech will rapidly be devalued by free models the moment they stop lighting stacks of cash on fire.
The parent post was arguing that they can do this now because they are lighting stacks of cash on fire. And once they stop doing that, their LLM lead will be gone in a hurry. They appear to not have a moat, like other more established players do.
How is this helping OpenAI?
https://www.reuters.com/business/retail-consumer/openai-taps...
For inference? This is from July 2025: OpenAI tests Google TPUs amid rising inference cost concerns, https://www.networkworld.com/article/4015386/openai-tests-go... / https://archive.vn/zhKc4
> ... due to the exclusive deal with Microsoft
This exclusivity went away in Oct 2025 (except for 'API' workloads).
OpenAI has contracted to purchase an incremental $250B of Azure services, and Microsoft will no longer have a right of first refusal to be OpenAI’s compute provider.
https://blogs.microsoft.com/blog/2025/10/28/the-next-chapter... / https://archive.vn/1eF0VThe central issue (or so they claimed) was that people might misconstrue my comment as representing the company I was at.
So yeah, I don’t understand why people are making fun of this. It’s serious.
On the other hand, they were so uptight that I’m not sure “opinions are my own” would have prevented it. But it would have been at least some defense.
In my experience it didn't matter at all, they considered "you work for us, its known you work for us, therefore your opinions reflect on us".
Absolute nonsense, they don't pay me for 24 hours of the day. I told them where they can stick it (politely) and got a new job.
Also a relief to hear that other people had to deal with this nonsense. I was afraid the reaction would be “there’s no way that happened,” since at the time I could hardly believe it either.
Bold and silly of you to even reveal where you work tbh.
Their employer? They may work at related company, and are required to say this.
But I think you’re right
it's like people are LARPing a Fortune company CEO when they're giving their hot takes on social media
reminds me of Trump ending his wild takes on social media with "thank you for your attention to this matter" - so out of place, it makes it really funny
*typo
At least in large tech companies, they have mandatory social media training where they explicitly tell employees to use phrases like "my views are my own" to keep it clear whether they're speaking on behalf of their employer or not.
Disclaimers aren’t there for folks who are thinking and acting rationally.
They are there for people who are thinking irrationally and/or manipulatively.
There are (relatively speaking) a lot of these people. They can chew up a lot of time and resources over what amounts to nothing.
Disclaimers like this can give a legal department the upper hand in cases like this
A few simple examples:
- There is a person I know who didn’t renew the contract of one of their reports. Pretty straightforward thing. The person whose contract was not renewed has been contesting this legally for over 10 years. The outcome is guaranteed to go against the person complaining, but they have time and money, so they tax the legal team of their former employer.
- There is a mid-sized organization that had a small legal team that had its plate full with regular business stuff. Despite settlements having NDAs, word got out that fairly light claims of sexual harassment and/or EEO complaints would yield relatively easy five-figure payments. Those complaints exploded, and some of the complaints were comical. For example, one manager represented a stance for the department to the C-suite that was 180 degrees opposite of what the group of three managers had agreed to prior. Lots of political capital and lots of time had to be used to clean up that mess. That person’s manager was accused of sex discrimination and age discrimination simply for asking the person why they did that (in a professional way, I might add). That person got a settlement, moved to a different department, and was effectively protected from administrative actions due to it being considered retaliation.
> Sounds like the company in the latter example really screwed up
Interesting. I think they made an unfortunate but sound decision based on their circumstances.
> but how does that connect to disclaimers?
It doesn’t directly.
> Is it just an example of malicious behavior?
Yes. It’s an example of how absolutely bat-shit crazy people can behave in ways that can tax a company’s legal team. Having folks use a disclaimer will almost certainly lighten some of this load in terms of defending against folks who weaponize online comments made by employees.
when i give my hot takes pseudonymously on social media these phrases would be nothing but a LARP
i don't put my real name here nor do i put my professional commitments in my profile, and neither does this guy
That is a bold claim!
"There is no free will." - Dr. Robert Sapolsky
You could reasonably say that "A majority of frontier labs uses TPU to train and serve their model."
[1]: We train and run Claude on a range of AI hardware—AWS Trainium, Google TPUs - April 6th, Anthropic on Google and Broadcom partnership [2]: "[Apple foundation model]... builds on top of JAX and XLA, and allows us to train the models with high efficiency and scalability on various training hardware and cloud platforms, including TPUs and both cloud and on-premise GPUs" - Apple in 2024
He's been saying whatever is good for Nvidia for years now without any regard for truth or reason. He's one of the least trustworthy voices in the space.
They'll presumably catch up, there is no monopoly on talent held by the US. And, that's more true than ever now that the US is actively hostile to immigrants. Scientists who might have come to the US three years ago have little reason to do so now.
But even that distinction is only temporary, since we're determined to piss away any remaining research lead that draws people in.
Hopefully the next administration will work at actively reversing the damage, with incentives beyond just "we pinky-promise not to haul you at gunpoint to a concrete detention center and then deport you to Yemen".
Won't be enough to undo the damage. The US would have to do a full about face, prosecute crimes of the current administration and enact serious core reforms to make it impossible for things to drastically change again in 4 years. Also known as, never going to happen because even the current opposition party doesn't actually want structural change. The world has seen how bad the US can get from a single election, and that isn't changing any time soon.
Been saying that about EU and China for decades now.
Yet the top European and Chinese still come to the US. Even in April 2026.
Google's TPUs have obvious advantages for inference and are competitive for training.