undefined

points

[-]

OP’s Qwen3.6 27B Q6 seems to run north of 20GB on huggingface, and should function on an Apple Silicon with 32GB RAM. Smaller models work unreasonably well even on my M1/64GB MacBook.

I am getting 10tok/sec on a 27B of Qwen3.5 (thinking, Q4, 18GB) on an M4/32GB Mac Mini. It’s slow.

For a 9B (much smaller, non-thinking) I am getting 30tok/sec, which is fast enough for regular use if you need something from the training data (like how to use grep or Hemingways favorite cocktail).

I’m using LMStudio, which is very easy and free (beer).

by UncleOxidant6 hours ago|

prev|

[-]

Not who you asked, but I've got a Framework desktop (strix halo) with 128GB RAM. In linux up to about 112GB can be allocated towards the GPU. I can run Qwen3.5-122B (4-bit quant) quite easily on this box. I find qwen3-coder-next (80b param, MOE) runs quite well at about 36tok/sec. Qwen3.5-27b is a bit slower at about ~24tok/sec but that's a dense model.