Hacker News
new
past
comments
ask
show
jobs
points
by
canpan
16 hours ago
|
comments
by
bigyabai
15 hours ago
|
[-]
Seconding this. You can get A3B/A4B models to run with 10+ tok/sec on a modern 6/8GB GPU with 32k context if you optimize things well. The cheapest way to run this model at larger contexts is probably a 12gb RTX 3060.
reply