undefined

points

[-]

Could you (or anyone with the same hardware) try antirez's ds4 and report how gracefully it degrades with only the 64GB RAM? Obviously it's going to be dog slow at best for any single inference flow, but can you meaningfully improve on that by running many sessions in parallel? (Ideally you'd need roughly on the order of model sparsity in order to get meaningful sharing of MoE weights, but whether that's genuinely achievable is anyone's guess!)