upvote
Have you (or anyone else) tried letting agents compete? For example, give the same coding task to two models, or to the same model with a different seed, and have the reviewer choose the better result.

Some think the human brain works similarly: thousands of mini-brain cortical columns, each with a slightly different take on the situation, voting in a majority-rules system.

reply
I wish someone would do a benchmark and competition for this kind of work flow so we could figure out what works well.

Like "Here's this consumer grade GPU. Using only this GPU but with whatever models and workflow you want, see how well you can do on xyz benchmark."

Contestants would be given like 1 hour max and scored based on % of questions answered, % of questions correct and total time to finish.

Like "The Local AI challenge"

reply