upvote
The Nex N2 model they merged is based on Qwen 3.5, so you can swap pieces of one into the other. They found a combination of the two that did well on some benchmarks and shipped it.

In the early days of Llama there were a lot of experiments like this. There were even some interesting combinations of models where they stacked layers of different models together or even added more layers with interesting results.

But announcing that you spliced two models together isn't very impressive in 2026, so they announced that they had done their own post training and outdid the big labs. They thought nobody would look close enough to notice.

reply
Out of curiosity, how was it discovered? You would have to look for it to find this linear combination.
reply
Without the system prompt, asking its name results in it responding with the name of the model they're ripping from. That would certainly draw your eyes to the right places.
reply
Why is this? Do labs reinforce the model name during training? I was under the impression that this sort of "self-knowledge" always came from the system prompt, but I guess not...
reply
Check the linked GitHub issue. They explain their process.

Scroll past the first issue to find it. It’s further down.

reply