undefined

points

by zozbot2343 hours ago |

comments

by nl1 hours ago|

[-]

Are there any benchmarks (or even vibes!) about the token/second one can expect with this strategy?

by zozbot2341 hours ago|

parent|

[-]

No real fixed benchmarks AIUI since performance will then depend on how much extra RAM you have (which in turn depends on what queries you're making, how much context you're using etc.) and how high-performance your storage is. Given enough RAM, you aren't really losing any performance because the OS is caching everything for you.

(But then even placing inactive experts in system RAM is controversial: you're leaving perf on the table compared to having them all in VRAM!)