SOTA can code but can also prove theorems and teach you about music theory or ancient Greece's substrate language or botany. Speaking in tens of different languages. I wonder how many hundreds of billions of parameters can be saved just by removing much of the general knowledge parts while keeping logical and programming abilities the exact same.
Network Bandwidth, Storage space and speed, memory capacity. While all of these were worth optimizing for at a point in history, that point is behind us. It's probably a reasonable expectation that it will eventually be true for VRAM.