upvote
That just sounds like a 3090.
reply
not at the vram sizes that control how much context to load; also, GPUs arn't as effiecient as direct inference.
reply