undefined

points

[-]

Where did you see a haiku comparison? Haiku 4.5 was my daily driver for a month or so before Opus 4.5 dropped and would be unreasonably happy if a local model can give me similar capability

by daemonologist15 hours ago|

parent|

[-]

I didn't see a direct comparison, but there's some overlap in the published benchmarks:

                           │ Qwen 3.6 35B-A3B │ Haiku 4.5               
   ────────────────────────┼──────────────────┼──────────────────────── 
    SWE-Bench Verified     │ 73.4             │ 66.6                    
   ────────────────────────┼──────────────────┼──────────────────────── 
    SWE-Bench Multilingual │ 67.2             │ 64.7                    
   ────────────────────────┼──────────────────┼──────────────────────── 
    SWE-Bench Pro          │ 49.5             │ 39.45                   
   ────────────────────────┼──────────────────┼──────────────────────── 
    Terminal Bench 2.0     │ 51.5             │ 61.2 (Warp), 27.5 (CC)  
   ────────────────────────┼──────────────────┼──────────────────────── 
    LiveCodeBench          │ 80.4             │ 41.92

These are of course all public benchmarks though - I'd expect there to be some memorization/overfitting happening. The proprietary models usually have a bit of an advantage in real-world tasks in my experience.

by coder54315 hours ago|

parent|

prev|

[-]

Artificial Analysis hasn't posted their independent analysis of Qwen3.6 35B A3B yet, but Alibaba's benchmarks paint it as being on par with Qwen3.5 27B (or better in some cases).

Even Qwen3.5 35B A3B benchmarks roughly on par with Haiku 4.5, so Qwen3.6 should be a noticeable step up.

https://artificialanalysis.ai/models?models=gpt-oss-120b%2Cg...

No, these benchmarks are not perfect, but short of trying it yourself, this is the best we've got.

Compared to the frontier coding models like Opus 4.7 and GPT 5.4, Qwen3.6 35B A3B is not going to feel smart at all, but for something that can run quickly at home... it is impressive how far this stuff has come.

by naasking3 hours ago|

parent|

[-]

Qwen models commonly get accused of benchmaxxing though. Just something to keep in mind when weighing the standard benchmarks.

by deaux12 hours ago|

parent|

prev|

[-]

I find Gemma 4 26B A4B better than Haiku 4.5 and that's smaller than this one.