Hacker News
new
past
comments
ask
show
jobs
points
by
GeekyBear
13 hours ago
|
comments
by
manmal
13 hours ago
|
[-]
That’s what, 14GB/s? The GPU‘s VRAM can do 100x that.
reply
by
GeekyBear
13 hours ago
|
parent
|
[-]
A discrete consumer GPU card doesn't have enough fast RAM to run a very large model that hasn't been quanitized to hell.
That's why all the projects streaming models into the GPU from an SSD popped up recently.
reply
by
manmal
10 hours ago
|
parent
|
[-]
Yes. There’s just no way to get above 1t/s that way with a large model.
reply