upvote
I was reminded of "model alloys", where they randomly select a LLM for every agentic turn. This significantly boosted performance on security work.

(10 points on the benchmark, or a relative increase of over 20%)

https://news.ycombinator.com/item?id=44630724

TFA on the other hand tests two things at once: mixing models, and "fuse a model with itself",! the latter being just test time compute. e.g. Opus was able to match Fable on TFA, at the cost of costing twice as much money (and presumably time).

These two dimensions are orthogonal but can be combined for further gains.

It's not clear that every task benefits from it though. The only benched deep research, and their results are a bit weird. (e.g. they have DeepSeek outranking frontier models.)

More research needed!

reply
Agree, and I see opus and Gemini pro as “quality” on openrouter fusion, this would be super pricy if the prompts are dynamic and not optimised for caching.

I would love to hear why they have created it, what was the business case, what this is going to serve? As you said, this is pretty easy to replicate

reply
[dead]
reply