undefined

points

by hintymad1 hours ago |

comments

by Aurornis48 minutes ago|

[-]

> A simple linear combination of every weight did not degrade the performance of the model, but enhanced it.

Enhanced it on a couple benchmarks, supposedly.

The game is to turn knobs until you get a benchmark run that shows an improvement, then ship it. There are a lot of fine tunes and chimera models on HuggingFace that are supposedly better at some specific test, but when you use them for anything else they're usually worse.

This happens with a lot of the models that are modified to remove censorship. They succeed in getting the model to emit previously censored outputs, but the overall output quality decreases.

by andai28 minutes ago|

parent|

[-]

They seem to have deleted most of the README now, but the archived version has benchmarks.

https://web.archive.org/web/20260614082641/https://huggingfa...

And the Nex benchmarks for comparison

https://huggingface.co/nex-agi/Nex-N2-Pro

Rio seems to be about halfway between Qwen 3.5 and Nex, as you'd expect?

by woadwarrior011 hours ago|

prev|

[-]

It's is a well known idea[1], although it's still surprising that something as simple, even works.

[1]: https://arxiv.org/abs/2203.05482

by kolanos59 minutes ago|

parent|

[-]

This team could have stopped here and still had something interesting (albeit not novel) to show. But the hype cycle was too tempting.

by x31249 minutes ago|

prev|

[-]

This works because Nex itself is a finetune of Qwen3.5 (https://huggingface.co/nex-agi/Nex-N2-Pro). It's merging Qwen3.5 with a Qwen3.5 finetune.

I don't believe this would work on two LLMs that have different pretraining. Even if it did you would need two LLMs that have exact same internal activation shapes, dimensions, expert counts, token vocabulary, realistically it would never happen outside of finetunes or academic experiments.

by kristjansson17 minutes ago|

prev|

[-]

https://thickets.mit.edu

by 1 hours ago|

prev|

[-]

deleted

by themafia9 minutes ago|

prev|

[-]

> A simple linear combination of every weight did not degrade the performance of the model, but enhanced it.