undefined

points

[-]

I'm referencing it as being possible, however I didn't share benchmarks because candidly the performance would be so slow it would only be useful for very specific tasks over long time horizons. The more practical use cases are less flashy but capable of achieving multiple tokens/sec (ie smaller MoE models where not all experts need to be loaded in memory simultaneously)

by causal5 hours ago|

prev|

[-]

Yeah title comes from nowhere in the link. No doubt it's possible but all that matters is speed and we learn nothing of that here...