Hacker News
new
past
comments
ask
show
jobs
points
by
zozbot234
6 hours ago
|
comments
by
mike_hearn
3 hours ago
|
next
[-]
You can disaggregate though. So draft models can run on cheaper hardware with less RAM, saving time on the more expensive machines with more RAM.
reply
by
cma
4 hours ago
|
prev
|
[-]
I think it also gets use in the /fast modes the providers sell at higher cost.
reply
by
gunalx
2 hours ago
|
parent
|
[-]
They probably use it on all models. Fast is probably just a resource pool with less congestion and therefore faster throughput per user but less efficent.
reply