upvote
Yes, it's described in this section - https://huggingface.co/Qwen/Qwen3.5-397B-A17B#processing-ult...

Yarn, but with some caveats: current implementations might reduce performance on short ctx, only use yarn for long tasks.

Interesting that they're serving both on openrouter, and the -plus is a bit cheaper for <256k ctx. So they must have more inference goodies packed in there (proprietary).

We'll see where the 3rd party inference providers will settle wrt cost.

reply
Thanks, I've totally missed that

It's basically the same as with the Qwen2.5 and 3 series but this time with 1M context and 200k native, yay :)

reply
Unsure but yes most likely they use YaRN, and maybe trained a bit more on long context maybe (or not)
reply