upvote
That estimate doesn't account for context, which is very important for tool use and coding.

I used this napkin math for image generation, since the context (prompts) were so small, but I think it's misleading at best for most uses.

reply
> You won't like it, but the answer is Apple.

Or strix halo.

Seems rather over simplified.

The different levels of quants, for Qwen3.6 it's 10GB to 38.5GB.

Qwen supports a context length of 262,144 natively, but can be extended to 1,010,000 and of course the context length can always be shortened.

Just use one of the calculators and you'll get much more useful number.

reply
What Strix Halo system has unified memory? A quick google says it's just a static vram allocation in ram, not that CPU and GPU can actively share memory at runtime
reply
All. Keep in mind strix != strix halo.

You can get tablets, laptops, and desktops. I think windows is more limited and might require static allocation of video memory, not because it's a separate pool, just because windows isn't as flexible.

With linux you can just select the lowest number in bios (usually 256 or 512MB) then let linux balance the needs of the CPU/GPU. So you could easily run a model that requires 96GB or more.

reply
> What Strix Halo system has unified memory?

All of them. The static VRAM allocation is tiny (512MB), most of the memory is unified

reply