undefined

points

by steveharing12 hours ago |

comments

by rnadomvirlabe1 hours ago|

[-]

Why no doubt?

by captainbland1 hours ago|

parent|

[-]

No comparison with competitor models other than the previous granite version strongly implies that it does not compete well with other comparable models. At least this is the most reasonable assumption until data comes out to the contrary

by 2ndorderthought1 hours ago|

parent|

prev|

[-]

Qwen 36 is effectively a pocket sized frontier model. It's really surprising for me anyway

by steveharing11 hours ago|

parent|

prev|

[-]

Because Qwen 3.6 pushes way above its weight. Granite 8B is impressive, but Qwen still wins on raw capability, especially for coding.

by rnadomvirlabe1 hours ago|

parent|

[-]

You just asserted the same thing again. Why do you say this is the case?

by 2ndorderthought1 hours ago|

parent|

[-]

Qwen scores above sonnet in coding benchmarks. Runs locally. In personal use it's really good. Anecdotally others have used it to vibe code or agentic code successfully. Not toy problems. Not a toy model.

Qwen3.6 raises the bar for models of its size. There really isn't a comparison in my opinion.

by noodletheworld1 hours ago|

parent|

prev|

[-]

Having tried it.

Qwen is really good.

Also, generally, it makes sense. 8B models are generally not very good^.

That this 8B model is decent is impressive, but that it could perform on par with a good model 4 times as large is a daydream.

^ - To be polite. The small models + tool use for coding agents are almost universally ass. Proof: my personal experience. Ive tried many of them.

by irishcoffee1 hours ago|

parent|

[-]

So it’s just like, your opinion, man?

edit: It was a play on The Big Lebowski, folks.

by Terretta46 minutes ago|

parent|

[-]

College SAT scores do not tell you how the dev applying for your open back end systems engineering job is going to do once they're in your workplace harness.

Nor do class standings, nor hackerrank and the like.

What will tell you is asking them to fix a thing in your codebase. Once you ask an LLM to do that, a dozen times, I'd argue it's no longer "just your opinion man", it's a context-engineered performance x applicability assessment.

And it is very predictive.

But it's also why someone doing well at job A isn't necessarily going to be great at B, or bad at A doesn't mean will necessarily be bad at B.

I've often felt we should normalize a sort of mutual try-buy period where job-change seeker and company can spend a series of days without harming one's existing employment, to derisk the mutual learning. ESPECIALLY to derisk the career change for the applicant who only gets one timeline to manage, opposed to company that considers the applicant fungible.

But back to the LLM, yeah, the only valid opinion on whether it works for you is not benchmark, it's an informed opinion from 'using it in anger'.

by robotmaxtron32 minutes ago|

parent|

prev|

[-]

the (dead) internet is full of opinions exactly like this

by brazukadev27 minutes ago|

parent|

[-]

you tried qwen3.6 and you think it is not good?

by robotmaxtron17 minutes ago|

parent|