In particle physics, as more of the code is being GPU-accelerated, there is now another integer ratio to worry about optimizing: CPU core per GPU device. Across the landscape, some jobs have zero GPU acceleration, others may need 100 cores to keep a GPU busy, or only 1 core. Yet others can tune their CPU/GPU ratio to optimize throughput given what hardware ratio a given facility provides. Only a fraction of the software in the ecosystem takes up this challenge.
Most physicist pretending to be software developers or vice versa who are involved in the field do not consider any of these computing realities. At some level, that's natural and excusable. It's hard enough to develop the simulation and reconstruction and analysis algorithms. Simultaneously optimizing their implementation for throughput on a given hardware assumption is even harder. Harder still is to do that optimization over the variety of hardware assumptions. There are only a few cases where this holistic thinking has driven the design of the software.
I feel like we're about to learn a similar lesson with generative AI. Things don't always get easier/better/faster.