upvote
Me too! The problem is that people don't love having 128gb of DDR5 held back with a laptop-grade iGPU. It puts up strictly non-interactive speed for LLMs of that size.

When you layer those same models across 128gb of dGPUs, then you can actually fill the KV cache in seconds, instead of minutes. And you get higher memory bandwidth on most professional cards.

reply