Hacker News
new
past
comments
ask
show
jobs
points
by
jnovek
23 hours ago
|
comments
by
bityard
23 hours ago
|
[-]
No, even MoE models need to fit into (V)RAM. MoE has faster inference because only a subset of layers are used to predict the next token, but the set of layers used changes with every token.
reply