>Anyway, I wasn't able to test using proxmox or vmware on there to split up cpu/memory resources; we decided instead to just buy a bunch of smaller-core-count AMD Ryzen 1Us instead, which scaled way better with my naive approac
If that was single 384 (192 times 2 for hyperthreading) CPU you are getting "only" 12 DDR5 channels, so one RAM channel is shared by 16c/32y
So just plain 16 core desktop Ryzen will have double memory bandwidth per core
And 384 actual cores or 384 hyperthreading cores?
Inference is so memory bandwidth heavy that my expectations are low. An EPYC getting 12 memory channels instead of 2 only goes so far when it has 24x as many cores.