Hacker News
new
past
comments
ask
show
jobs
points
by
thewebguyd
1 days ago
|
comments
by
Catloafdev
1 days ago
|
next
[-]
Nobody runs unquantized, there's literally no reason to. Q8 would be the largest anyone actually runs on consumer hardware for inference.
reply
by
22 hours ago
|
parent
|
next
[-]
deleted
reply
by
bityard
22 hours ago
|
parent
|
prev
|
[-]
Halving the precision of the weights is not a free lunch...
reply
by
Catloafdev
20 hours ago
|
parent
|
[-]
Q8 is virtually lossless. The quantization is much more noticeable around Q4 and below. FP16->Q8 on consumer hardware is 2x the speed at ~99.99% the quality.
reply
by
rvba
11 hours ago
|
parent
|
[-]
Any source that confirms the 99.99% quality?
reply
by
bitexploder
1 days ago
|
prev
|
next
[-]
It also comes down to inference speed, not "can I run this". 8-bit quant is quite a bit slower on an M5 Pro.
reply
by
gchamonlive
1 days ago
|
prev
|
[-]
[dead]
reply