Even so, the multipliers and shifters occupy only a small fraction of the total area, a fraction that is smaller then implied by their number of gates, because they have very regular layouts.
A reduction from the ideal 1:2 FP64/FP32 throughput to 1:4 or in the worst case to 1:8 should be enough to make negligible the additional cost of supporting FP64, while still keeping the throughput of a GPU competitive with a CPU.
The current NVIDIA and AMD GPUs cannot compete in FP64 performance per dollar or per watt with Zen 5 Ryzen 9 CPUs. Only Intel B580 is better in FP64 performance per dollar than any CPU, though its total performance is exceeded by CPUs like 9950X.