Out the other end over about 3-4 five-hour-sessions comes about 85% functional code for every single listed thing. I'd guess you'd be looking at a team for months, give or take, without the automation. Total cost was around $50 in VM time (not counting claude since I would be subscribed anyway) I'm not letting that thing anywhere near a computer I care about and rust compiles are resource intensive, so I paid for a nice VM that I could smash the abort button on if it started looking at me funny.
So I liken it to buying an enormous bulldozer. If you're a skilled operator you can move mountains, but there'll still be a lot of manual work and planning involved. It's very clearly directionally where the industry will go once the models are improved and the harnesses and orchestration are more mature than "30% of the development effort is fixing the harness and orchestration itself", plus an additional "20% of your personal time will be knocking two robots heads together and getting them to actually do work"
Edit: some more details of other knock on work - I asked for a complexity metadata field to automatically dispatch work to cheaper/faster models, set up harnesses to make opencode and codex work similarly to how claude works, troubleshot some bugs in the underlying gastown system. Gastown fork is public if you'd like to have a look.
Does it deliver on the "realistic" part? My experience with most models is they make something that technically fulfills the ask, but often in a way that doesn't really capture my intent (this is with regular Claude Code though).