upvote
I remember reading somewhere about an HPC "rule of thumb" that was like 1GB RAM/1GHz of processing is about the right amount for most applications.. Don't recall where that came from or what the exact reasoning was. That kind of aligns with 2GB/core.. Was that a thing?
reply
Maybe it was a thing from before clock speeds saturated and Moore's law mutated into being satisfied by ever increasing core count? It could have been useful when spec'ing hardware with the goal of running multi-threaded jobs that scale both their throughput and RAM requirements linearly with number of threads. That would make it sort of a continuous version of the more discrete GB/core rule of thumb. However, such linearly-scaled MT jobs are a subset of all MT jobs. And, ST jobs are not covered. A 2GB ST job on a 1GB / 1GHz machine will again need two cores, wasting one. When run on a 5GB/5GHz machine the job needs only its one core but must waste 3GB.

In particle physics, as more of the code is being GPU-accelerated, there is now another integer ratio to worry about optimizing: CPU core per GPU device. Across the landscape, some jobs have zero GPU acceleration, others may need 100 cores to keep a GPU busy, or only 1 core. Yet others can tune their CPU/GPU ratio to optimize throughput given what hardware ratio a given facility provides. Only a fraction of the software in the ecosystem takes up this challenge.

Most physicist pretending to be software developers or vice versa who are involved in the field do not consider any of these computing realities. At some level, that's natural and excusable. It's hard enough to develop the simulation and reconstruction and analysis algorithms. Simultaneously optimizing their implementation for throughput on a given hardware assumption is even harder. Harder still is to do that optimization over the variety of hardware assumptions. There are only a few cases where this holistic thinking has driven the design of the software.

reply
It's endlessly fascinating to me how long it's taking for GPUs to be adopted for this kind of workload. I remember all the excitement around the GPGPU era, OpenCL, and eventually CUDA. That was like 15yr ago! Yes, GPUs can do a fantastic amount of computing. But it's really hard to make them do it efficiently. I think maybe the implicit assumption at the time was that something would come along that would make it easier. Despite continuous advances over the last decade+ it's still really hard.

I feel like we're about to learn a similar lesson with generative AI. Things don't always get easier/better/faster.

reply