Your assertion, then, is that even a 1 sentence prompt is as good as a 5 section markdown spec with detailed coding style guidance and feature, by feature specification. This is simply not true; the detailed spec and guidance will always outperform the 1 sentence prompt.
The plan mode is useful because if you do corrections during development mode it does that silly thing where it leaves comments referring to your corrections.
> How do you deal with the comments sometimes being relatively noisy for humans?
To extents, that is a function of tweaking the prompt to get the level of detail desired and signal/vs noise produced by the LLM. e.g. constraining the word count it can use for comments.We have a small team of approvers that are reviewing every PR and for us, not being able to see the original prompt and flow of interactions with the agent, this approach lets us kind of see that by proxy when reviewing the PR so it is immensely useful.
Even for things like enum values, for example. Why is this enum here? What is its use case? Is it needed? Having the reasoning dumped out allows us to understand what the LLM is "thinking".
(Of course, the biggest benefit is still that the LLM sees the reasoning from an earlier session again when reading the code weeks or months later).
Function docs: for AI, with clear trigger (“use when X or Y”) and usage examples.
Future agents see the past reasoning as it `greps` through code. Good especially for non-obvious context like business and domain-level decisions that were in the prompt, but may not show in the code.
I can't prove this, but I'm also guessing that this improves the LLM's output since it writes the comment first and then writes the code so it is writing a mini-spec right before it outputs the tokens for the function (would make an interesting research paper)