undefined

points

[-]

This is discussed in the article:

"My personal impression is that within these quantizations Qwen 3.6 27B is as good as (or maybe slightly better than) DwarfStar4. Though, I won’t be surprised if for longer context projects DS4 has an edge."

by sfifs6 hours ago|

parent|

[-]

Used both. DeepSeek-4 Flash Q2 - last 6 layers Q4 quant with DwarfStar which just about fits in 128Gb is definitely superior IMO - my contexts tend to run typically 50-100k. Throughput tends to be about 12-13k tok/sec - just about acceptable.

by drnick123 hours ago|

prev|

[-]

Works beautifully on a 3090, very usable speed. Don't expect Opus 4.8-level performance, but there are some things you just need to keep local.

by ljosifov23 hours ago|

parent|

[-]

True - they are workhorses. Not super bright, but good enough for lots of everyday tasks. I've found sweet spot to be turning thinking off, as it adds small or no value, while increasing the token count and waiting time. Last 27B I used was https://huggingface.co/Jackrong/Qwopus3.6-27B-Coder-GGUF - specifically post-train adapted a bit to run with thinking off. I saw today the 35B-A3B MoE from the same HF acc is out, downloading that rn to try.

by kroaton20 hours ago|

parent|

[-]

Please don't use that garbage. Just use the base Qwen models or Nex/Orinth, as those are the only properly post-trained finetunes. The Qwopus models are marketing.

by aand1620 hours ago|

parent|

[-]

Can you expand on why Qwopus is not recommended and what "Nex/Orinth" brings to the table?

by kroaton20 hours ago|

prev|

[-]

"DeepSeek-V4-Flash will fit" At Q2, 2bit? Lobotomized to death.

by ljosifov10 hours ago|

parent|

[-]

Hobbled - but not to death, the few times I use it (usually on a plane). I tried 2bit of a 20% REAP reduced experts. :-O That's the biggest that fits on my own h/w (3yrs old M2 Max 96gb). It's coherent, it does work, doesn't fall apart on casual use. IDK if better than dense 27b. Think 27b was slower on the same h/w. DS4F has got 1M context window. Nowadays with weeks long run hermes sessions, I get to 300k-400k context depths easily. The speed decline profile of DS4F with context depth increase is superior to any other model I try. (I try them all - love this stuff) Only previous model coming close on that is nemotron-cascade-2 (only 30b-a3b) - that also has 1M context window.