upvote
Same issue here, wanted to give it a shot but ran into that error trying to load the model in lm studio.
reply
It needs a mlx fork because the lowest bit in mlx is 2 currently (for affine quantization).
reply