upvote
The experts in MoEs aren't specialized in any meaningful task sense. From level of what we would think as tasks MoEs are selected essentially arbitrarily per token and per block.
reply
It’s unsupervised, yes, but “unspecialized in any meaningful task sense” is incorrect, that’s the whole point. It’s just not in the sense of “this is a legal expert, this is a software developer”.
reply
Optimal expert separation depends on the goal and can be pretty arbitrary, for example DeepSeek v4 separates them more or less by domain if I remember correctly.
reply