upvote
Same here, I use Qwen 3.6 27b (Q6 quant) with llama.cpp on an RTX 5090 using the pi agent exclusively now. The fact that it's local means that I never have to think about token pricing, quotas, time of day, or data sensitivity. I have limited the GPU from 600W to 450W which means the system stays whisper quiet during inference.

I have become so "lazy" (in a good way), so far that I've started using the model for lots of daily mundane things on top of just coding:

  * "commit this on a branch, push, create a PR and assign $nickname for review"
  * "Use the Stripe CLI to download all open and overdue invoices and reconcile them with this CSV export from our bank account."
  * "Use these Elasticsearch credentials to summarise what kind of operations are causing load at the moment."
  * "Tell me if our codebase already supports X and where it's  implemented."
reply
I am forced to use Qwen 3.6 27b at work and found it next to useless. I might as well do all the work manually rather than having it implement another mess or get the debugging entirely wrong.

It feels like anything less than Sonnet is just a waste of time, apart from use as a smarter search function.

It also strikes me as strange that you would mention Codex for UI polish, as it's notoriously bad at UI, and far behind Claude Opus. Altman specifically posted that they are working to improve this for the next model release.

reply
Qwen3.5-122B is actually Qwen3.5-122B-A10B. The A10B means that this is a "mixture of experts" model where only 10B parameters are activated at a given time. Whereas Qwen3.6-27B is a "dense" model where all 27B parameters are activated all the time. So for many tasks, you'd expect the 27B dense model to be better than the 122B-A10B model.
reply
why 27b vs 35b? Is MoE that much worse for coding?
reply
Yeah MoE is a little worse for the same size, but you can often run bigger MoEs at respectable speeds even on cpu ram offload. The dense models really need to be 100% vram
reply