upvote
A DGX B200 costs like ~$0.5 M and uses around 14 kW.

If you plan to run it straight for 8 years 100% max usage thats around 1 GWhr.

A gigawatt hour is a lot of energy but its not that much compared to the price of the actual machine. In Germany for example with its expensive energy thats about €100k worth, which spread over 8 years is pretty minor compared to the up front half mill.

The real issue with high power consumption is not really the cost of energy but the limited powersupply you can get for a datacenter. A more efficient setup is highly desirable because it means you can fit more in the limited power hookup.

reply
It's not even about the costs, getting enough power for a large datacenter is impractically hard in most of the world at a single location.

If it's efficient and the power costs of not just ongoing costs but the upfront setup is lower that makes a lot different scales of data centers practical, especially for inference which doesn't need massive super clusters.

You can't just fire up gas turbines everywhere like US Data centers are doing. I am not even sure if that's legal in US...

Note you have to plan for peak usage and a lot of stuff large scale data centers are insane infrastructure projects.

Nvidia is both supply and price constrainted, sure if you are willing to pay over 0.5M$ you might get some, but if you try to balance out price to costs by going slightly lower on the pole you realize just how much more expensive Nvidia truly feels like AMD has a lot of margin to under cut them if they want to.

reply
> but the limited powersupply you can get for a datacenter.

Since many people haven't seen 10MW cabling for a data center or how a big GPU server is cabled, they naturally imagine connecting servers is akin to plugging an appliance to a wall.

When the electricity provider says "I neither have the capacity, nor the required cables in that area", thing gets real.

reply
What they're really asking the authors is "can you not lie about performance cost and do proper accounting?". You can spin any story if you cherry pick your framing sufficiently. Stopping right at the silicon packaging boundary is as meaningless as it seems.

The article is highly qualified but the headline is not. If they are not making general statements then they shouldn't open with them.

reply
It’s more than power supply. Cooling and ventilation becomes a MUCH bigger deal at rack scale, and that costs electricity too.
reply
With liquid cooling technologies (direct or rear door heat exchange), cooling efficiently is easier when compared to a decade ago, and it's pretty efficient when you compare the power consumption numbers (server total vs. cooling total).

See PUE (Power Usage Effectiveness) for its scientific form.

reply
Cooling demand is only fractional with respect to the load: cooling 1MW of heat will only cost a few 10's to low 100's of kW, depending on the specifics. 10-20% overhead on cooling is probably a close enough estimate for napkin math.
reply
And datacenters have impact on everything around them. If at the end of the day to result is a few more yachts and jets and, a lot more of miserable humans starving in ruined ecosystems, maybe that’s not the best go-to direction.
reply
You say they have a large impact, but having lived somewhere with some of the largest data centers- they very much don't. At least not more so then any other structure that paves over greenery.

love to debate actual discission points. pull up "datacenter dfw" on google maps for mine.

reply
The people having glass literally break from the vibrations would probably disagree with your opinion

https://youtu.be/_bP80DEAbuo?is=sg09k66iutKFIFSo

Yet here we are, discussing "data center" as if they're standardized and of similar (nose) isolation.

There are no meaningful regulations in building them, and they can be incredibly polluting. So your experience with a potentially well isolated one is sadly not the norm going forward. And we don't even know how close you lived, if you're eg talking about "within 5km/3miles" then your experience would also have little value in this discussion in general.

reply
>The people having glass literally break from the vibrations would probably disagree with your opinion

Can you cite a source for this? It's not in the video, as far as I can tell.

I would be wary of Benn Jordan's videos. They are full of mistakes and misrepresentations, as Andy Masley has convincingly demonstrated: https://blog.andymasley.com/p/contra-benn-jordan-data-center...

I recall seeing Benn Jordan's responses on Bluesky and thinking they were quite poor. He was unwilling to admit to mistakes, and kept trying to grasp at newly searched papers that didn't actually support his arguments.

reply
Benn unfortunately is one of those people that actually feel stuff, which is a trait that easily gets exploited by bad actors.

Indeed, he shot himself in the foot there pretty bad, but I would argue that that was just the result of successful Agitation.

I would personally strongly prefer being in the same room with Benn compared with Andy, because one of them is authentic, while the other is calculating. Though, arguably, Benn has been catching up on that lately too.

But yeah, taking stuff with a grain of salt should be the default regardless of the person speaking.

reply
The fact that people have lived and worked near data centres for decades and didn't even know what the term meant - let alone be adversely impacted by them - probably indicates they're broadly an non issue. All of a sudden out of nowhere, AI and data centres got intermingled by the media and now people seem to have big issues with them.
reply
Because the dynamics have shifted enormously inside the rack.

10 years ago, I was running 4 CPU servers with 48 cores and 128GB of RAM in 2U enclosures with a maximum power consumption of 500W or so. I was able to stick ~20 of them in a 42U rack, totaling 10kW.

A data center full of these can be cooled with CRACs and hot/cold aisles without much problem. This is still too much for a bog-standard server colocation operation, but for HPC, that was normal and manageable.

Now, a ~1U server houses 4 SOTA NVIDIA GPUs, 64 cores, magnitudes more RAM. This server alone uses ~3KW of power. This means you go anywhere between 30kW to 50kW per rack, and you have many racks.

Of course this means more power comes in, more heat comes out. This means more sophisticated infrastructure: bigger and beefier primary and secondary power systems, beefier cooling, more heat, more noise, in short "more of everything".

Of course when you cram this much energy and heat into a relatively small space, its effect on the environment will be much more pronounced.

Facebook's previous SOTA datacenter used water infused, HEPA filtered free flowing air accross the datacenter. Now, it's server level direct liquid cooling with extensive water treatment and oversight on coolant parameters.

Compare this having a hand warmer vs. coal ember in your hand. The latter needs a much more elaborate setup to prevent it burning you badly.

reply
Why are you implying all datacenters are GPU farms? You can't retrofit that kind of power density into existing buildings.

You can stuff GPU servers into existing buildings- but even with significant upgrades you end up with a lot of empty space on the floor that can't be used.

reply
Two main reasons.

1. Article is about AI, so I have given the example for an AI datacenter.

2. In pure CPU datacenters, the power dynamics do not change much. I can add more servers to a single rack, but the rack power is again in the 30kW to 50kW range, so you're planning and building for the same power capacity.

> You can stuff GPU servers into existing buildings-

Yes.

> but even with significant upgrades you end up with a lot of empty space on the floor that can't be used.

Yes & No. It's not impossible to convert an old datacenter to support ~35KW/rack capacity, but it's not cheap, and you'll have more worries than holes, piping, building and power. Namely, can your floor handle that much weight to begin with?

reply
Though, the new data centers are not entirely the same. Increasing use of onsite gas turbines to generate power instead of using grid power changes their noise+air pollution profile.
reply
afaik, it's only the so called "portable" generators openAI used to contravene noise and pollution regulations.
reply
The problem these days is lack of nuance. It should seem entirely reasonable to be pro-datacenters-if-they're-done-right, but it feels like there are only two sides to any issue. Gas turbine whine noise isn't coming from the data center, it's being used to power the data center, but the camp is either pro data center or not, and fuck any nuance.
reply
The problem is people keep trying to regulate businesses by name instead of by the effects they have.

If we had regulations on noise, vibration, emissions, water use, electromagnetic radiation, whatever else, then it wouldn’t matter what people tried to build — if it fits within the guidelines great, otherwise back to the drawing board.

Putting “data center” in your ordinances is as lazy and ineffective as putting “abattoir.”

reply
> If we had regulations on noise, vibration, emissions, water use, electromagnetic radiation, whatever else, then it wouldn’t matter what people tried to build

We certainly do! It’s just often overridden and ignored for these companies and data centers

reply
> If we had regulations on noise, vibration, emissions, water use, electromagnetic radiation, whatever else, then it wouldn’t matter what people tried to build — if it fits within the guidelines great, otherwise back to the drawing board.

Sane jurisdictions do have regulations regarding these things. Not all jurisdictions are sane, some of them are run by people who sell out their residents.

Suburbs and cities around me all have noise regulations, my state has its own pollution regulations, and the local water utilities don’t hook up customers that stress the system. Unfortunately there are places like Texas, Tennessee, Louisiana, Mississippi that don’t give two shits about their citizens and let companies run temporary natural gas turbines permanently and all kinds of other nonsense.

reply
Maybe the lack of nuance is due to learning, through decades of experience, that the assumption “it won’t be done right” can be baked in.
reply
So people have a decades-long expectation that local government will fail them?

This does sound plausible, but it's also pretty sad and not a sign of a healthy democracy

reply
I'm hard pressed to think of anyone who believes that America has a healthy democracy. Even those most recently elected continually claim that democracy is under threat.
reply
Because the reality is while we all debate the nuances companies just do whatever they want, and it’s usually whatever offloads the most issues to the public because it saves them more money.
reply
Sounds exactly like the stories with 5G cell towers. Almost no problems with GSM and then suddenly 5G is big issue.
reply
> There are no meaningful regulations in building them

If a municipality doesn’t have emissions, noise, water use, etc regulations, that’s a serious failure in governance.

We don’t need nor want the word “data center” in regulations anymore than we need the word “abattoir.”

The names of the things we build change all the time. Their impact on their communities don’t.

We need to regulate impact, not the name or type of business.

If we did, nobody would know or care about data centers and they wouldn’t be affecting their communities, because they’d be operating under established impact regulations.

reply
How far do you live from a data center?
reply
a constant low 60dB 20Hz hum in the background, 24/7 is as close as a a torture technique invented by the CIA as it can get.
reply
Plus the power needed for cooling adding maybe 50%.
reply
Interesting so it’s supply chain and then you need to calculate how long it can be utilized and for how much you can sell it.

Would love more calculations on that

reply
> I have never seen a company use AMD outside of wafer and a couple others mostly in US.

There's a few using them, and even more starting to experiment with them. AMD has long been a source of disappointment around this side of things, so I'm hesitant to feel optimistic we'll finally get some competition. The market really needs viable competition to Nvidia, especially performance/watt.

reply
reply
It's not clear when this will be - AMD has slipped these dates likely to 2027.
reply
OpenAI maybe, but a few friends in Meta said they don't so dunno man. Seems sus atm.

But it's meta they can get a GW up of AMD in a year

reply
> I have never seen a company use AMD outside of wafer and a couple others mostly in US.

Worth remembering AMD basically "owns" (not literally) the hardware-side of things in video games consoles for good many years now, with no end in sight.

reply
I was talking in the data center gpu context, EPYCs are pretty common in data centers these days.

I have a huge EPYC based data center like 200-300+km from my house on the outskirts of the city a few dozen miles from a IT industry tech park(place with lots of IT company offices).

reply
Because they have x86 CPU licenses.
reply
Consoles used to all be custom architectures. If Intel was the only one doing x86 and AMD had offered the same price, performance and features as they do now, but in another architecture, my bet is that in that universe AMD would still have gotten the contract. Using x86 is a big deal to simplify things, but so is AMD's APU with unified memory between CPU and GPU (similar to what Apple now does with their silicon)
reply
Every single video game console of the last generation (and probably further back) are using AMD Radeon for graphics too FWIW. I think the Switch might be the only outlier recently using nvidia graphics.
reply
AMD invented x86_64
reply
> I have never seen a company use AMD outside of wafer and a couple others mostly in US.

Just because you haven't seen it doesn't mean it doesn't exist.

We've serviced over 700 customers on our MI300x.

reply
deleted
reply
Typically any company that can’t get Nvidia to fill their orders will have at least some AMD.
reply
What type of company are you talking about here? Granted, nowadays I mostly interact with ML-adjacent companies but almost none would go "Hmm, hard to get nvidia hardware today, lets dump all expertise and knowledge of CUDA et al we have and start using AMD hardware until we can get nvidia", everyone would just wait or rent in the meantime.
reply
Inference workloads are usually a lot less picky about the exact hardware than model training. At least in the cases I know of the models are trained on Nvidia hardware, then exported and run on a mix of Nvidia and AMD
reply
At scale for inference it's almost non-existent for a data center company to go for AMD because they couldn't get or afford Nvidia atm.

They instead start the build out and plug in stuff they can, then take a loan or ask Nvidia to help fund it. (I am not joking)

I believe the case is if you can prove to Nvidia you can install and provide more Nvidia capacity they help out because more Capacity going online today is in the best interest of Nvidia.

Spot prices of Nvidia GPUs going up is not good news for Nvidia btw. The people renting Nvidia has the least amount of friction in moving off Nvidia, especially with AI tools you could build and get up to speed with AMD stack much sooner...

So if Nvidia is truly not an option and you entire company is not a bet on Nvidia then you will move off but only as a renter not as a buyer unless they truly can't fund Nvidia I suppose.

But again I repeat if you build a datacenter and provide good enough base Nvidia will help fund you to a mostly complete data center.

People might not like it but that's the reason Nvidia is so unreasonably dominant even now when otherwise given the scale of investments it might have been cheaper to look for alternatives.

This is why Nvidia doesn't like the China stack.

reply
[flagged]
reply
[flagged]
reply
deleted
reply
AMD MI355X uses 1,400W per GPU and NVIDIA B200 uses 1,200W. So AMD uses about 16% more power.
reply
Not how you measure performance per watt but generally it’s 20-60% worse at tok/s/watt not 16. It does have ~50% more memory (~100gb) which complicates the comparison.
reply