I used this napkin math for image generation, since the context (prompts) were so small, but I think it's misleading at best for most uses.
Or strix halo.
Seems rather over simplified.
The different levels of quants, for Qwen3.6 it's 10GB to 38.5GB.
Qwen supports a context length of 262,144 natively, but can be extended to 1,010,000 and of course the context length can always be shortened.
Just use one of the calculators and you'll get much more useful number.
You can get tablets, laptops, and desktops. I think windows is more limited and might require static allocation of video memory, not because it's a separate pool, just because windows isn't as flexible.
With linux you can just select the lowest number in bios (usually 256 or 512MB) then let linux balance the needs of the CPU/GPU. So you could easily run a model that requires 96GB or more.
All of them. The static VRAM allocation is tiny (512MB), most of the memory is unified