Now shrinking them sure, but I’ve seen nothing that indicates you can just page weights in and out without cratering your performance like you would with a non MoE model
my current system of looking for 1 in 1000 posts on HN or 1 in 100 on r/locallama is tedious.