Hacker News
new
past
comments
ask
show
jobs
points
by
SlavikCA
15 hours ago
|
comments
by
phamilton
5 hours ago
|
[-]
MTP on a MoE is hit or miss. If you're bottlenecked on memory, MTP can increase the number of active experts (like any batch processing would), which can eat away gains from it.
reply