upvote
MoE is not a bad idea for local inference if you have fast storage to offload to, and this is quickly becoming feasible with PCIe 5.0 interconnect.
reply
MoE is excellent for the unified memory inference hardware like DGX Sparc, Apple Studio, etc. Large memory size means you can have quite a few B's and the smaller experts keeps those tokens flowing fast.
reply