upvote
10k context is not a whole lot, this model theoretically supports up to 1M. But the KV cache storage takes up a whole lot more memory capacity at full context than DeepSeek V4 Pro, let alone Flash. (About ~96GB according to readily available KV cache calculators, might be more in practice. For comparison DeepSeek Flash is ~10GB and Pro is at least in that ballpark.) So I'm not sure that this model is a good deal for memory-constrained machines unless you're specifically interested in very short contexts only. This could still be worth it if it came with a game-changing increase in smarts but that seems a bit unlikely so far.

It will be interesting to see how this model does under a SSD streaming scenario, the lower sparsity should ideally be favorable.

> Local inference needs really hard a 1.2 / 1.5 T/s memory bandwidth system with 512GB and 2/3 times the GPU compute of Mac Studio M3 Ultra, at an affordable 10/15k price point. A variant with 1TB memory would also be welcomed at 20k price point.

Are these realistic specs at present? Not that clear to me, 1.5 T/s seems really high.

reply
Thank you for your work on DwarfStar! It is truly helping democratize access to frontier tech.
reply