undefined

points

[-]

It isn't 100% clear, but what quantization were you using for each? I've had worse results with MLX 8bit than what you get with Q4 GGUF, same model, seems mxfp8 or bf16 is needed when ran with MLX to get something worthwhile out of them, but I've done very little testing, could have been something specific with the model I was testing at the time.

by pmarreck9 hours ago|

prev|

[-]

I was not aware of this. I might not be willing to trade accuracy for speed in this case, then.