upvote
Are there any benchmarks (or even vibes!) about the token/second one can expect with this strategy?
reply
No real fixed benchmarks AIUI since performance will then depend on how much extra RAM you have (which in turn depends on what queries you're making, how much context you're using etc.) and how high-performance your storage is. Given enough RAM, you aren't really losing any performance because the OS is caching everything for you.

(But then even placing inactive experts in system RAM is controversial: you're leaving perf on the table compared to having them all in VRAM!)

reply