│ Qwen 3.6 35B-A3B │ Haiku 4.5
────────────────────────┼──────────────────┼────────────────────────
SWE-Bench Verified │ 73.4 │ 66.6
────────────────────────┼──────────────────┼────────────────────────
SWE-Bench Multilingual │ 67.2 │ 64.7
────────────────────────┼──────────────────┼────────────────────────
SWE-Bench Pro │ 49.5 │ 39.45
────────────────────────┼──────────────────┼────────────────────────
Terminal Bench 2.0 │ 51.5 │ 61.2 (Warp), 27.5 (CC)
────────────────────────┼──────────────────┼────────────────────────
LiveCodeBench │ 80.4 │ 41.92
These are of course all public benchmarks though - I'd expect there to be some memorization/overfitting happening. The proprietary models usually have a bit of an advantage in real-world tasks in my experience.Even Qwen3.5 35B A3B benchmarks roughly on par with Haiku 4.5, so Qwen3.6 should be a noticeable step up.
https://artificialanalysis.ai/models?models=gpt-oss-120b%2Cg...
No, these benchmarks are not perfect, but short of trying it yourself, this is the best we've got.
Compared to the frontier coding models like Opus 4.7 and GPT 5.4, Qwen3.6 35B A3B is not going to feel smart at all, but for something that can run quickly at home... it is impressive how far this stuff has come.