points
Yes, I was thinking about context buffers, which I assume are not small in large models. That has to be loaded into VRAM, right?
If I keep sending large context buffers, will that hog the batches?