I think there are drastic differences between computer vision models and LLMs that you’re not considering. LLMs are
huge relative to vision models, and require gobs of fast memory. For this reason a little USB dongle isn’t going to cut it.
Put another way, there already exist add-in boards like this, and they’re called GPUs.