undefined

points

by Confiks2 days ago |

comments

by egorfine2 days ago|

[-]

> As you have so much RAM I would suggest running Q8_0 directly

On the 48GB mac - absolutely. The 24GB one cannot run Q8, hence why the comparison.

> And just to be sure: you're are running the MLX version, right?

Nah, not yet. I have only tested in LM Studio and they don't have MLX versions recommended yet.

> but has since been fixed on the main branch

That's good to know, I will play around with it.

by egorfine1 days ago|

prev|

[-]

> That too was broken in mlx-lm (it crashed), but has since been fixed on the main branch

Unfortunately I have got zero success running gemma with mlx-lm main branch. Can you point me out what is the right way? I have zero experience with mlx-lm.

by Confiks1 days ago|

parent|

[-]

Get into a venv, and run:

> pip3 install git+https://github.com/ml-explore/mlx-lm.git

> ./venv/bin/mlx_lm.generate --model "$MODEL" --temp 1.0 --top-p 0.95 --top-k 64 --max-tokens 128000 --prompt "Hello world"

Where $MODEL is an unsloth model like:

- unsloth/gemma-4-E4B-it-UD-MLX-4bit

- unsloth/gemma-4-26b-a4b-it-UD-MLX-4bit

by egorfine23 hours ago|

parent|

[-]

Thanks!

by minimaxir1 days ago|

prev|

[-]

Gemma 4 is not supported by the MLX engine yet.

by Confiks1 days ago|

parent|

[-]

It is, as I'm running it; it has been added this week. As I said I'm running the main version from Github and doing nothing special, see: https://news.ycombinator.com/item?id=47761308