upvote
Qwen3.5 comes in various sizes (including 27B), and judging by the posts on HN, /LocalLlama etc., it seems to be better at logic/reasoning/coding/tool calling compared to Gemma 4, while Gemma 4 is better at creative writing and world knowledge (basically nothing changed from the Qwen3 vs. Gemma3 era)
reply
Does this also apply to gemma's 26B-A4B vs say Qwens 35B-A3B?

I'm not sure if I can make the 35B-A3B work with my 32GB machine

reply
It should be easy with a Q4 (quantization to 4 bits per weight) and a smallish context.

You won't have much RAM left over though :-/.

At Q4, ~20 GiB

https://huggingface.co/unsloth/Qwen3.5-35B-A3B-GGUF

reply
For llama-server (and possibly other similar applications) you can specify the number of GPU layers (e.g. `--n-gpu-layers`). By default this is set to run the entire model in VRAM, but you can set it to something like 64 or 32 to get it to use less VRAM. This trades speed as it will need to swap layers in and out of VRAM as it runs, but allows you to run a larger model, larger context, or additional models.
reply
Gemma 4 31B is still not impressive at coding compare to even Qwen 3.5 27B. It's just not its strong suit.

So far gemma 4 seems excellent at role playing, document analysis, and decent at making agentic decisions.

reply
This has been my experience as well, Qwen via Ollama locally has been very very impressive.
reply