undefined

upvote

points

by physicles16 hours ago |

upvote

by syntaxing13 hours ago|

[-]

Q8 or Q6_UD with no KV cache quantization. I swear it matters even more with small activated parameters MOE model despite the minimal KL divergence drop

reply