upvote
Exactly, compare MoE with MoE and dense with dense otherwise it's apples and oranges.
reply
Its coding to coding. I could care less how the model is architected, i only care how it performs in a real world scenario.
reply