undefined

points

[-]

> Assume FP64 units are ~2-4x bigger.

I'm pretty sure that's not a remotely fair assumption to make. We've seen architectures that can eg. do two FP32 operations or one FP64 operation with the same unit, with relatively low overhead compared to a pure FP32 architecture. That's pretty much how all integer math units work, and it's not hard to pull off for floating point. FP64 units don't have to be—and seldom have been—implemented as massive single-purpose blocks of otherwise-dark silicon.

When the real hardware design choice is between having a reasonable 2:1 or 4:1 FP32:FP64 ratio vs having no FP64 whatsoever and designing a completely different core layout for consumer vs pro, the small overhead of having some FP64 capability has clearly been deemed worthwhile by the GPU makers for many generations. It's only now that NVIDIA is so massive that we're seeing them do five different physical implementations of "Blackwell" architecture variants.

by 9 hours ago|

parent|

[-]

deleted

by jcranmer20 hours ago|

prev|

[-]

> Assume FP64 units are ~2-4x bigger.

I'm not a hardware guy, but an explanation I've seen from someone who is says that it's not much extra hardware to add to a 2×f32 FMA unit the capability to do 1×f64. You already have all of the per-bit logic, you mostly just need to add an extra control line to make a few carries propagate. So the size overhead of adding FP64 to the SIMD units is more like 10-50%, not 100-300%.

by adrian_b14 hours ago|

parent|

[-]

Most of the logic can be reused, but the FP64 multiplier is up to 4 times larger. Also some shifters are up to 2 times larger (because they need more stages, even if they shift the same number of bits). Small size increases occur in other blocks.

Even so, the multipliers and shifters occupy only a small fraction of the total area, a fraction that is smaller then implied by their number of gates, because they have very regular layouts.

A reduction from the ideal 1:2 FP64/FP32 throughput to 1:4 or in the worst case to 1:8 should be enough to make negligible the additional cost of supporting FP64, while still keeping the throughput of a GPU competitive with a CPU.

The current NVIDIA and AMD GPUs cannot compete in FP64 performance per dollar or per watt with Zen 5 Ryzen 9 CPUs. Only Intel B580 is better in FP64 performance per dollar than any CPU, though its total performance is exceeded by CPUs like 9950X.

by adrian_b14 hours ago|

prev|

[-]

A FP64 unit can share most of two FP32 units.

Only the multiplier is significantly bigger, up to 4 times. Some shifters may also be up to twice bigger. The adders are slightly bigger, due to bigger carry-look-ahead networks.

So you must count mainly the area occupied by multipliers and shifters, which is likely to be much less than 10%.

There is an area increase, but certainly not of 50% (300 m^2). Even an area increase of 10% (e.g. 60-70 mm^2 for the biggest GPUs seems incredibly large).

Reducing the FP64/FP32 throughput ratio from 1:2 to 1:4 or at most to 1:8 is guaranteed to make the excess area negligible. I am sure that the cheap Intel Battlemage with 1:8 does not suffer because of this.

Any further reductions, from 1:16 in old GPUs until 1:64 in recent GPUs cannot have any other explanation except the desire for market segmentation, which eliminates small businesses and individual users from the customers who can afford the huge prices of the GPUs with FP64 support.

by wmf21 hours ago|

prev|

[-]

Why would gamers want to pay for any features they don't use?

Obviously they don't want to. Now flip it around and ask why HPC people would want to force gamers to pay for something that benefits the HPC people... Suddenly the blog post makes perfect sense.

by rustyhancock21 hours ago|

parent|

[-]

Similar to when Nvidia released LHR GPUs that nerfed performance for Ethereum mining.

NVIDIA GeForce RTX 3060 LHR which tried to hinder mining at the bios level.

The point wasn't to make the average person lose out by preventing them mining on their gaming GPU. But to make miners less inclined to buy gaming GPUs. They also released a series of crypto mining GPUs around the same time.

So fairly typical market segregation.

https://videocardz.com/newz/nvidia-geforce-rtx-3060-anti-min...

by adrian_b14 hours ago|

parent|

prev|

[-]

NVIDIA could make 2 separate products, a GPU for gamers and a FP accelerator for HPC.

Thus everybody would pay for what they want.

The problem is that both NVIDIA and AMD do not want to make, like AMD did until a decade ago and NVIDIA stopped doing a few years earlier, a FP accelerator of reasonable size and which would be sold at a similar profit margin with their consumer GPUs.

Instead of this, they want to sell only very big FP accelerators and at huge profit margins, preferably at 5-digit prices.

This makes impossible for small businesses and individual users to use such FP accelerators.

Those are accessible only for big companies, who can buy them in bulk and negotiate lower prices than the retail prices, and who will also be able to keep them busy for close to 24/7, in order to be able to amortize the excessive profit margins of the "datacenter" GPU vendors.

One decade and a half ago, the market segmentation was not yet excessive, so I was happy to buy "professional" GPUs, with unlocked FP64 throughput, at a price about twice greater in comparison with consumer GPUs.

Nowadays, I can no longer afford such a thing, because the similar GPUs are no longer 2 times more expensive, but 20 to 50 times more expensive.

So during the last 2 decades, first I shifted much of my computations from CPUs to GPUs, but then I had to shift them back to CPUs, because there are no upgrades for my old GPUs, any newer GPU being slower, not faster.

by david-gpu13 hours ago|

parent|

[-]

Throughout this article you have been voicing a desire for affordable and high-througput fp64 processors, blaming vendors for not building the product you desire at a price you are willing to pay.

We hear you: your needs are not being met. Your use case is not profitable enough to justify paying the sky-high prices they now demand. In particular, because you don't need to run the workload 24/7.

What alternatives have you looked into? For example, Blackwell nodes are available from the likes of AWS.

by adrian_b8 hours ago|

parent|

[-]

I think that you might have confused me with the author of the article.

American companies have a pronounced preference for business-to-business products, where they can sell large quantities in bulk and at very large profit margins that would not be accepted by small businesses or individual users, who spend their own money, instead of spending the money of an anonymous employer.

If that is the only way for them to be profitable, good for them. However such policies do not deserve respect. They demonstrate the inefficiencies in the management of these companies, which prevent them from competing efficiently in markets for low-margin commodity products.

From my experience, I am pretty certain that a smaller die version of the AMD "datacenter" GPUs could be made and it could be profitable, like such GPUs were a decade ago, when AMD was still making them. However today they no longer have any incentive to do such things, as they are content with selling a smaller number of units, but with much higher margins, and they do not feel any pressure to tighten their costs.

Fortunately at least in CPUs there has been a steady progress and AMD Zen 5 has been a great leap in floating-point throughput, exceeding the performance of older GPUs.

I am not blaming vendors for not building the product that I desire, but I am disappointed that years ago they have fooled me to waste time in porting applications to their products, which I bought instead of spending money for something else, but then they have discontinued such products, with no upgrade path.

Because I am old enough to remember what happened 15 to 20 years ago, I am annoyed about the hypocrisy of some discourses of the NVIDIA CEO, which have been repeated for several years after introducing CUDA, which were more or less equivalent with promises that the goal of NVIDIA is to put a "supercomputer" on the desk of everyone, only for him to pivot completely from these claims and remove FP64 from "consumer" GPUs, in order to be able to sell "enterprise" GPUs at inflated prices. Then soon this prompted AMD to imitate the same strategy.

by thesz18 hours ago|

prev|

[-]

  > Assume FP64 units are ~2-4x bigger.

This is wrong assumption. FP64 usually uses the same circuitry as two FP32, adding not that much ((de)normalization, mostly).

From the top of my head, overhead is around 10% or so.

  > Why would gamers want to pay for any features they don't use?

https://www.youtube.com/watch?v=lEBQveBCtKY

Apparently FP80, which is even wider than FP64, is beneficial for pathfinding algorithms in games.

Pathfinding for hundredths of units is a task worth putting on GPU.

by kbolino8 hours ago|

parent|

[-]

Has FP80 ever existed anywhere other than x87?

by tliltocatl16 hours ago|

prev|

[-]

10% sounds implausibly high. Even on GPUs, most of area are various memories and interconnect.