For now I will just wait for AMD or Intel to release a x86 platform with 256G of unified memory, which would allow me to run larger models and stick to Linux as the inference platform.
At 80B, you could do 2 A6000s.
What device is 128gb?
If you're targeting end user devices then a more reasonable target is 20GB VRAM since there are quite a lot of gpu/ram/APU combinations in that range. (orders of magnitude more than 128GB).
[1]: https://www.jeffgeerling.com/blog/2025/increasing-vram-alloc...