upvote
It depends on the model and the mix. For some MoE models lately it’s been reasonably fast to offload part of the processing to CPU. The speed of the GPU still contributes a lot as long as it’s not too small of a relative portion of compute.
reply