Yes. Let's do the math. The fastest sd cards can read at around 300 MB/s (https://havecamerawilltravel.com/fastest-sd-cards/). Modern GPUs use 16 lanes of PCIe gen 5, which is 16x32Gb/s = 512Gb/s = 64 GB/s. Meaning you'd need over 200 of the fastest SD cards. So what you're asking is: is there a reason GPUs don't use 200 SD cards? And I can't think of any way that would work
SD is obviously the wrong interface for this but "High Bandwidth Flash" (stacked flash akin to HBM) is in development for exactly this kind of problem. AMD actually made a GPU with onboard flash maybe a decade ago but I think it was a bit early. Today I would love to have a pool of 50GB/s storage attached to the GPU.
Oh definitely. The AMD past product just stuck 4x m.2 slots onto the board. Today that approach would be 50-60 GB/s read speed which would be useful enough for something that any of the vendors could build with existing components.
The next gen inference chips will use High Bandwidth Flash (HBF) to store model weights.
These are made similarly to HBM but are lower power and much higher capacity. They can also be used for caching to reduce costs when processing long chat sessions.