We've been stuck with the same general caps on standard GPU memory since then though. Perhaps limited in part because of the generational upgrades happening in the bandwidth of the memory, rather than the capacity.
A one time effective 30% reduction in model size simply isn't going to be some massive unlocker, in theory or in practice.