Hacker News
new
past
comments
ask
show
jobs
points
by
ryandrake
22 hours ago
|
comments
by
zozbot234
22 hours ago
|
next
[-]
That "traditional" setup is the recommended setup for running large MoE models, leaving shared routing layers on the GPU to the extent feasible. You can even go larger-than-system-RAM via mmap, though at a non-trivial cost in throughput.
reply
by
21 hours ago
|
prev
|
next
[-]
deleted
reply
by
khimaros
21 hours ago
|
prev
|
[-]
Strix Halo is another option
reply