Hacker News
new
past
comments
ask
show
jobs
points
by
SwellJoe
22 hours ago
|
comments
by
cpburns2009
22 hours ago
|
[-]
A 32gb card does run it nicely. I use unsloth's UD-Q5_K_XL at 256k context (k/v at q8_0), and get ~67 t/s on a 5090. I still need to look into MTP.
reply
by
adornKey
11 hours ago
|
parent
|
next
[-]
Nice. I used Q4_K_M to have some headroom. But yours seems to fit nicely.
reply
by
pbgcp2026
20 hours ago
|
parent
|
prev
|
[-]
[dead]
reply