undefined

points

by bityard4 hours ago |

comments

by kamov4 hours ago|

[-]

What do you use local models for? I'm asking generally about possible applications of these smaller models

by Gravey4 hours ago|

prev|

[-]

Would you mind sharing your hardware setup and use case(s)?

by CamperBob24 hours ago|

parent|

[-]

Not the GP but the new Qwen-Coder-Next release feels like a step change, at 60 tokens per second on a single 96GB Blackwell. And that's at full 8-bit quantization and 256K context, which I wasn't sure was going to work at all.

It is probably enough to handle a lot of what people use the big-3 closed models for. Somewhat slower and somewhat dumber, granted, but still extraordinarily capable. It punches way above its weight class for an 80B model.

by paxys47 minutes ago|

parent|

[-]

"Single 96GB Blackwell" is still $15K+ worth of hardware. You'd have to use it at full capacity for 5-10 years to break even when compared to "Max" plans from OpenAI/Anthropic/Google. And you'd still get nowhere near the quality of something like Opus. Yes there are plenty of valid arguments in favor of self hosting, but at the moment value simply isn't one of them.

by redwood_3 hours ago|

parent|

prev|

[-]

Agree, these new models are a game changer. I switched from Claude to Qwen3-Coder-Next for day-to-day on dev projects and don't see a big difference. Just use Claude when I need comprehensive planning or review. Running Qwen3-Coder-Next-Q8 with 256K context.

by zozbot2344 hours ago|

parent|

prev|

[-]

IIRC, that new Qwen model has 3B active parameters so it's going to run well enough even on far less than 96GB VRAM. (Though more VRAM may of course help wrt. enabling the full available context length.) Very impressive work from the Qwen folks.

by 4 hours ago|

prev|

[-]

deleted