upvote
I don’t see how Anthropic is in a better position. They have a slight edge in model quality right at a time when we’re getting a taste of what cheap, “good enough” AI looks like. They don’t own their own compute. And their own arrogance and lies have alienated a huge chunk of their customer base and alerted everyone to the dangers of being dependent on them.
reply
I personally think not owning their own compute is going to be an advantage.

There is a meteor headed towards all this AI investment that I don't think has been properly accounted for and that is, what happens to all the existing hardware investments when NVidia's next architecture comes out. Blackwell (H100/H200) is the current generation. Rubin (R100, presumably R200) is the next and arrives soon. Now a lot of the investment hasn't been spent yet so will likely be spent on Rubin but at that point, what happens when the next iteration comes out and does 3-4x the compute for the same electricity input and same hardware cost?

Also, what happens when people can run way bigger models on consumer hardware in 5 years? The effective limit for useful local LLMs is currently ~31B parameter models because the RTX 5090 has 32GB of VRAM and Apple's shared memory architecture, which can keep bigger models in memory, just doesn't have the raw processing power.

Anyway, why I argue Anthropic is in a better position (than OpenAI) is that they seem to have captured a market that may well be profitable for them as a company, specifically Claude for coding. So they just haven't burnt quite as much cash as OpenAI so aren't in as deep of a hole.

While I think local models are going to improve maassively over the next few years, running them in a data center at scale is always going to be cheaper for a company. Why? Because they can amortize their costs by running 24/7 and powering them and cooling them is simply cheaper at scale when you're talking about 1000+ engineers who otherwise might only be using their hardware ~40 hours a week.

IMHO Google is in the best position here of all the US companies, even though their models aren't the best, because their data centers are ruthlessly efficient, their homegrown TPUs will eventually catch up (and thus avoid the NVidia tax) and they simply haven't bet the farm on winning AI.

reply
I'm generally with you on all of these ideas.

However, Google probably won't catch up. Nvidia has been winning in spite of the fact that their hardware is general purpose rather than tuned for inference.

Rubin has architectural differences I don't understand that are supposed to make inference much cheaper and faster while still retaining those other more generic capabilities. Their next generation after that is going to do even better at being fast for inference and general purpose.

Google is betting that their TPUs won't depreciate faster than the markup they have to pay to Nvidia. I don't think they will be right.

reply
Why do people who don't follow the prices of A100 talk like they know things about GPU pricing dynamics?

A100s are ~7 years old and going for more than 2 dollars an hour, significantly more expensive than even 2 years ago. This is because anything with 80gb of VRAM or more and made by Nvidia will have economically useful lifespans of like, 10 years.

I could see H100s getting 12 years.

Micheal Berry doesn't know shit about GPUs.

reply
So I was curious about how A100s would do running DeepSeek v4. I can't find any instances of running v4 Pro on even an 8xA100 cluster. So you need to run Flash at ~284B params. A100s don't support FP8 so you're running FP16 so you're taking a hit that way. But I see estimates of 30-50tok/s for an 8xA100 cluster. They're drawing 300-400W each so you're looking at probably 3500+ Watts, which is roughly 0.01tok/W.

Now jump ahead 2 years and you seem to have a massive jump in performance [1]. The tokens/Watt goes up by at least 2 orders of magnitude. And the B100 is 3-4x that. And we're about to hit the R100 (Rubin) cliff.

That's what this is going to come down. When hyperscalar DCs are getting to Gigawatt power usage, it all comes down to power efficiency. Those A100s aren't far from being sold for scrap.

I've been looking into how different companies are handling depreciation for this. Amazon seems to be saying the life is 3-4 years, Google 4-5 and Meta is saying 8+, which I think is wildly optimistic.

[1]: https://lambda.ai/inference-models/deepseek-ai/deepseek-v4-f...

reply
You're focussing on inference ... is it not more likely that A100's are being used for training/fine tuning?
reply
> Another data point on this is the black market for Claude tokens in China [1]. The chat logs themselves are a commodity to train models.

anyone with IQ higher than 130 (thus qualified for actual AI R&D) would be questioning something obvious here -

if they are already doing such dodgy stuff with the aim to maximize profits, why would those resellers have large amount of logs with actual American model responses to sell to those AI labs in the first place. shouldn't they just post train & customize some leading Chinese open source models to pretend to be Opus or GPT for the vast majority of their users (as classified by some models) who don't know much about expected Opus behaviours & not skilled enough to tell the differences?

that is actually the interesting bit not covered in your censored version of the story line, it is also what happens on the ground. your censored version of the story implies that those dodgy resellers using stolen credit cards, pooling accounts with stolen IDs and illegally selling very personal logs would somehow be honest enough to spend extra $ to ensure their victims (aka paying users) can actually use real Opus and GPT. LOL

dude, you failed this IQ test miserably.

reply
You don't actually need a very high IQ to do AI R&D. More than it takes to post IQ comments on this site, maybe.
reply
The galaxy brains in the labs putatively buying the logs wouldn't notice this? Or figure out a structure to prevent this?
reply
resellers wouldn't be trying to sell such junk in the first place. they use faked models to avoid the cost of Opus tokens, not to double dip to scam those with arguably the highest IQ in the country.
reply
deleted
reply