You talk to a smart, heavy model to build a plan composed of smaller steps. Then you have the heavy model spin up smaller, cheaper LLMs to actually implement the tasks.
The heavy model is basically read-only in that mode. It can read files, execute tests, etc, but it can’t write code. It just tracks what needs to be done, offloads the work to dumber LLMs, validates the task is done, and moves on to the next step.
It sort of pushes humans up the stack. Instead of having a human sitting there prompting the LLM to start the next task, you have another LLM do that loop.
It’s been on my list to try out.
"The sparsely-gated MoE layer,[21] published by researchers from Google Brain, uses feedforward networks as experts, and linear-softmax gating. Similar to the previously proposed hard MoE, they achieve sparsity by a weighted sum of only the top-k experts, instead of the weighted sum of all of them."
"Top-k experts," in case of some DeepSeek's models k=1.
https://openrouter.ai/blog/announcements/fusion-beats-fronti...