undefined

points

by elAhmo19 hours ago |

comments

by senko18 hours ago|

[-]

It's a combination of reasoning effort (max) + enabling workflow that orchestrates multiple sub-agents.

After some interrogation, here's how it organized the work:

1. Design workflow (rts-game-design, 11 agents, ~13 min) ran first, produced SPEC.md + DESIGN.md:

1.1. Proposals (3 parallel agents): each designed a complete RTS from a different philosophy

1.2 Judge (1 agent): evaluated all three and synthesized one unified design, committing to specific numbers (costs, HP, map size, etc.).

1.3 Deep-dives (6 parallel agents): each wrote an implementation-ready spec for one subsystem, all consistent with the chosen design

1.4 Synthesis (1 agent): merged the design + all six subsystem specs into one conflict-free master spec

2. Code-review workflow (rts-code-review, 25 agents, ~5 min), ran after the main agent had written and tested the code:

2.1 Review (6 agents, read-only Explore type): each scrutinized one dimension and returned structured findings.

2.2. Verify (19 agents): every finding got its own skeptic agent told to try to refute it, Result: 19 flagged → 16 confirmed, 3 rejected as non-bugs.

What the main agent did in the main loop:

- Wrote all ~2,400 lines of index.html by hand from the spec.

- All browser testing/debugging via headless Chrome (I told it to use rodney by @simonw, love the tool :)

- Applied all 16 fixes from the review and re-verified them in the browser.

by 33MHz-i48616 hours ago|

parent|

[-]

seems like a rube-goldberg esque way to consume 10x tokens. is this really where the industry is heading?

by e12e13 hours ago|

parent|

[-]

I like to think of it like the difference between dropping a ball on a roulette wheel (get one random number/sequence of repeated) - vs dropping a ball on a carved topographic map, where valleys guide the ball to a particular outcome.

If you can stand a little AI expansion - here are a few points Gemini came up with - I think the idea has some merit:

https://g.co/gemini/share/b5b97867eeb1

(Maybe the better analogy is roulette vs pinball machine)

by derac15 hours ago|

parent|

prev|

[-]

Why is it Rube Goldbergesque? The process doesn't seem arbitrary.

by OJFord8 hours ago|

parent|

[-]

Rube Goldberg machines (or Heath Robinson contraptions) aren't arbitrary, they're complicated or contrived ways of achieving the process; often a very literal interpretation of how an automatic machine might imitate an otherwise manual action – a robotic hand movement for example. I think it's quite a good analogy, even if agentic Goldberg works well.

by sdfsdssdfsdf6 hours ago|

parent|

[-]

Those machines are, to quote Wikipedia, "designed to perform a simple task in a comically overcomplicated way". This implies there is a much simpler way that works just as well.

I don't think the Rube Goldberg analogy works if the agentic meandering is essential complexity required to get at the results. Rube Goldberging it would be something like putting this loop inside some comically overengineered enterprise microservice web which is then found out to be running inside a Window 98 emulator or what have you.

by Orygin5 hours ago|

parent|

[-]

> This implies there is a much simpler way that works just as well

Yes there is: Write the code yourself

by artur_makly2 hours ago|

parent|

prev|

[-]

Just to confirm - you did not generate this plan/orchestration/harness - it did all that on its own?

by senko1 hours ago|

parent|

[-]

Correct, that's the "workflows" part they introduced in claude code alongside the new model.

by chrisweekly3 hours ago|

parent|

prev|

[-]

Did you start with a clean slate or do you have global ~/.claude/CLAUDE.md and/or specific skills, plugins, etc?

by senko55 minutes ago|

parent|

[-]

I don't have global CLAUDE.md and the only non-default skill I have that was used here is the one to use rodney[0] headless browser. I didn't expressly tell Claude to do browser testing, it decided to do it on its own.

So no extra guidance beyond the prompt.

[0] https://github.com/simonw/rodney/

by jmtame14 hours ago|

parent|

prev|

[-]

Thanks for sharing this. Going to try it out on a game inspired by Rust. It's helpful re: the point on rodney - I've had a hard time getting the testing to work well in the browser.

by tcoff9119 hours ago|

prev|

[-]

it's a brand new mode

by colechristensen17 hours ago|

prev|

[-]

Biases the model to solve problems with teams of agents