Then, in your prompt you tell it the task you want, then you say, supervise the implementation with a sub agent that follows the architecture skill. Evaluate any proposed changes.
There are people who maximize this, and this is how you get things like teams. You make agents for planning, design, qa, product, engineering, review, release management, etc. and you get them to operate and coordinate to produce an outcome.
That's what this is supposed to be, encoded as a feature instead of a best practice.
So the LLM will do something and not catch at all that it did it badly. But the same LLM asked to review against the same starting requirement will catch the problem almost always
The missing thing in these tools is that automatic feedback loop between the two LLMs: one in review mode, one in implementation mode.
This sounds more like an automation of that idea than just N-times the work.
Just ask claude to write a plan and review/edit it yourself. Add success criteria/tests for better results.
```
Rules:
- Only one disk can be moved at a time.
- Only the top disk from any stack can be moved.
- A larger disk may not be placed on top of a smaller disk.
For all moves, follow the standard Tower of Hanoi procedure: If the previous move did not move disk 1, move disk 1 clockwise one peg (0 -> 1 -> 2 -> 0).
If the previous move did move disk 1, make the only legal move that does not involve moving disk1.
Use these clear steps to find the next move given the previous move and current state.
Previous move: {previous_move} Current State: {current_state} Based on the previous move and current state, find the single next move that follows the procedure and the resulting next state.
```
This is buried down in the appendix while the main paper is full of agentic swarms this and millions of agents that and plenty of fancy math symbols and graphs. Maybe there is more to it, but the fact that they decided to publish with such a trivial task which could be much more easily accomplished by having an llm write a simple python script is concerning.
this does eat up tokens _very_ quickly though :(
You run out of context so quickly and if you don’t have some kind of persistent guidance things go south
This would also be true of Junior Engineers. Do you find them impossible to work with as well?