undefined

points

[-]

It would apply equally to GPU or RAM inference as those are also bandwidth constrained on decode, so people already try to optimize for it.

[-]

surely the supply of unified memory will rise to meet demand before this is needed