upvote
And what if your local computer essentially has an model chip with dedicated memory where the model stays loading 100% of the time?
reply
It's an interesting point but local gpu efficiency is not something I think about when I'm being rate limited or when my subscription costs keep rising.
reply