upvote
With unified memory, reading from RAM to GPU compute buffer is not that painful, and you can use partial RAM caching to minimize the impact of other kinds of swapping.
reply
In practical terms, is this kind of architecture available to consumers except through Apple?
reply
AMD Strix Halo. Available in the Framework desktop, various mini PCs, and the Asus Rog Flow Z13 "gaming tablet." The Z13 is still at $2700 for 128 GB which is an incredible deal with today's RAM prices.

There's also the Nvidia DGX Spark.

reply
You don't have to only have the experts being actively used in VRAM. You can load as many weights as will fit. If there is a "cache miss" you have to pay the price to swap in the weights, but if there is a hit you don't.
reply