undefined

points

[-]

3.6 is the release version for Qwen. This model is a mixture of experts (MoE), so while the total model size is big (35 billion parameters), each forward pass only activates a portion of the network that’s most relevant to your request (3 billion active parameters). This makes the model run faster, especially if you don’t have enough VRAM for the whole thing.

The performance/intelligence is said to be about the same as the geometric mean of the total and active parameter counts. So, this model should be equivalent to a dense model with about 10.25 billion parameters.

by wongarsu16 hours ago|

parent|

[-]

And even if you have enough VRAM to fit the entire thing, inference speed after the first token is proportional to (activated parameters)/(vram bandwidth)

If you have the vram to spare, a model with more total params but fewer activated ones can be a very worthwhile tradeoff. Of course that's a big if

by zshn2516 hours ago|

parent|

prev|

[-]

Sorry, how did you calculate the 10.25B?

by darrenf15 hours ago|

parent|

[-]

> > The performance/intelligence is said to be about the same as the geometric mean of the total and active parameter counts. So, this model should be equivalent to a dense model with about 10.25 billion parameters.

> Sorry, how did you calculate the 10.25B?

The geometric mean of two numbers is the square root of their product. Square root of 105 (35*3) is ~10.25.

by cshimmin16 hours ago|

prev|

[-]

The 6 is part of 3.6, the model version. 35B parameters, A3B means it's a mixture of experts model with only 3B parameters active in any forward pass.

by zshn2516 hours ago|

parent|

[-]

Got it. Thanks

by JLO6416 hours ago|

prev|

[-]

35B (35 billion) is the number of parameters this model has. Its a Mixture of Experts model (MoE) so A3B means that 3B parameters are Active at any moment.

by zshn2516 hours ago|

parent|

[-]

~I see. What’s the 6?~

Nevermind, the other reply clears it

by joaogui116 hours ago|

prev|

[-]

3.6 is model number, 35B is total number of parameters, A3B means that only 3B parameters are activated, which has some implications for serving (either in you you shard the model, or you can keep the total params on RAM and only road to VRAM what you need to compute the current token, which will make it slower, but at least it runs)

by 16 hours ago|

prev|

[-]

deleted