upvote
It's feasible to put the expert routing logic in a previous layer. People have done it: https://arxiv.org/abs/2507.20984
reply
Manually no. It would have to be learned, and making the expert selection predictable would need to be a training metric to minimize.
reply
Making the expert selection more predictable also means making it less effective. There's no real free lunch.
reply