undefined

points

[-]

Aren't you just moving the problem a little bit further? If you can't trust it will implement carefully specified features, why would you believe it would properly review those?

by frde_me3 hours ago|

parent|

[-]

It's hard to explain, but I've found LLMs to be significantly better in the "review" stage than the implementation stage.

So the LLM will do something and not catch at all that it did it badly. But the same LLM asked to review against the same starting requirement will catch the problem almost always

The missing thing in these tools is that automatic feedback loop between the two LLMs: one in review mode, one in implementation mode.

by resonious2 hours ago|

parent|

[-]

I've noticed this too and am wondering why this hasn't been baked into the popular agents yet. Or maybe it has and it just hasn't panned out?

by bashtoni2 hours ago|

parent|

[-]

Anecdotaly I think this is in Claude Code. It's pretty frequent to see it implement something, then declare it "forgot" a requirement and go back and alter or add to the implementation.

by tclancy4 hours ago|

prev|

[-]

How does this not use up tokens incredibly fast though? I have a Pro subscription and bang up against the limits pretty regularly.

by doctoboggan4 hours ago|

parent|

[-]

It _does_ use up tokens incredibly fast, which is probably why Anthropic is developing this feature. This is mostly for corporations using the API, not individuals on a plan.

by digdugdirk4 hours ago|

parent|

[-]

I'd love to see a breakdown of the token consumption of inaccurate/errored/unused task branches for claude code and codex. It seems like a great revenue source for the model providers.

by shafyy3 hours ago|

parent|

[-]

Yeah, that's what I was thinking. They do have an incentive to not get everything right on the first try, as long as they don't over do it... I also feel like that they try to get more token usage by asking unnecesary follow up questions that the user may say yes to etc.

by andyferris3 hours ago|

parent|

prev|

[-]

It does use tokens faster, yes.