Hacker News
new
past
comments
ask
show
jobs
points
by
wmf
2 hours ago
|
comments
by
cyanydeez
17 minutes ago
|
[-]
not at the vram sizes that control how much context to load; also, GPUs arn't as effiecient as direct inference.
reply