I was about to post your last point / quote. Going multigpu is relatively not so though but once you go multi-node you have distributed storage/io/compute system which is highly non trivial. Add that the long training times now you have robustness/fault-tolerantness concerns with hardware failures and restarts. Today’s training systems are engineering marvels.