undefined

upvote

points

by rbbydotdev8 hours ago |

upvote

by Catloafdev3 hours ago|

[-]

I think people are going to continue to be surprised by the capability of small models.

Now, if you ask this model to have a conversation with you, it's gonna fail and be incoherent. But boy, does it sure reason through math problems well.

reply

upvote

by bakies7 hours ago|

[-]

I've just started using qwen3.6:35b a couple days ago running on my framework desktop and rather impressed. It runs really well and reminds me of probably the first Claude model I used. It's the first local model that's actually working for me in a coding agent I've tried. Very exciting!

reply

upvote

by smcleod6 hours ago|

[-]

Try 27b, it's significantly smarter than 35b-a3b (although it is slower, it's not so bad with MTP).

reply

upvote

by ignoramous6 hours ago|

[-]

At least according to gertlabs, Qwen3.6 27B outperforms every SoTA (closed) model at Kotlin: https://archive.vn/RYBCL / https://gertlabs.com/rankings?mode=agentic_coding&language=k...

reply

upvote

by iosjunkie5 hours ago|

[-]

Interesting. I wonder if there is opportunity to train a set of small model variants to excel at a certain stacks. Eg Qwen3.6-27B for Node + React or Qwen3.6-27B for Rust + TUI

reply

upvote

by gertlabs2 hours ago|

[-]

Qwen 3.6 27B is an anomalously strong all-around model for its size, but when we run our evaluations, we generate 10 coding submissions/language/model (110 total). So full discosure, the per-language per-model performances can be noisy (I do not think Qwen3.6 27B is better than Fable 5 in agentic workflows when writing Kotlin, given enough samples, although we do find some interesting anomalies that hold up under large sample sizes).

reply

upvote

by bakies4 hours ago|

[-]

Hmm, I just assumed bigger was better. How's it different?

reply

upvote

by Lalabadie4 hours ago|

[-]

Off the top of my head since it seems to be the quick info you're looking for: IIRC, with these two, the 27B is a dense model, meaning it's all active at inference. Meanwhile, the 35B is a Mixture of Experts (MoE), so only part of its network (3B?) is active at any time.

reply

upvote

by bakies4 hours ago|

[-]

Thanks! Dense models have been slow on my compute, but I'll give it a try. If its not toooooo slow then it's fine I mostly fire and forget agents anyway.

Edit: seems fast! I'll try it out some more, thanks again.

reply

upvote

by diseasedyak5 hours ago|

[-]

I'm running qwen36.:35b:iq4 IQ4_XS quant. Takes 18 GB of RAM with 131k context window. Seems to be really good. Have it running local stuff via Hermes, using a cloud model via Ollama (Deepseek V4-Pro) for heavy lifting.

reply

upvote

by tarruda4 hours ago|

[-]

If your framework desktop is the 128G Strix Halo, I recommend giving Qwen 3.5 122B-A10B a shot.

This Q5_K_M quant should be near lossless and fit with full 256K context in about 100GB of RAM: https://huggingface.co/AesSedai/Qwen3.5-122B-A10B-GGUF

reply

upvote

by Catloafdev3 hours ago|

[-]

3.6 scores better on coding across the board.

Edit: specifically Qwen 3.6 27B beats that on coding and agentic workflows.

reply

upvote

by bakies3 hours ago|

[-]

I'll keep this in mind.

reply

upvote

by andy996 hours ago|

[-]

Could you please share which coding agent you are using with it?

reply

upvote

by waezel5 hours ago|

[-]

Crush: https://github.com/charmbracelet/crush/

The Q8_K_XL MTP model from Unsloth: https://huggingface.co/unsloth/Qwen3.6-35B-A3B-MTP-GGUF

reply

upvote

by bakies4 hours ago|

[-]

I settled on opencode after trying goose and aider as well. I'll probably try some more but opencode worked similar to Claude code which is my main agent.

I serve the model with ollama and am thinking about replacing ollama but haven't looked into it.

I have openwebui for chat if I want that too, but don't really use it.

reply

upvote

by oneshtein5 hours ago|

[-]

npx @oh-my-pi/pi-coding-agent

reply

upvote

by npodbielski5 hours ago|

[-]

I am using Mistral Vibe.

reply

upvote

by NamlchakKhandro6 hours ago|

[-]

Pi

reply

upvote

by 5 hours ago|

[-]

deleted

reply

upvote

by j456 hours ago|

[-]

It feels sometimes like optimizations are only starting.

reply

upvote

by trollbridge5 hours ago|

[-]

I’m beginning to suspect the closed SOTA labs were doing all these optimisations, keeping quiet about it, and just charging us out the yinyang for inference.

reply