In an interactive session, adding "Fine, but make the button red" after the model generated a first solution more than doubles the tokens used. As the model now not only gets the original code and the feature request but also the updated code plus the change request as input tokens.
Sending a feature request to an LLM and then sending the feature request again with "The button shall be red" only doubles the tokens used.
I wrote my own agent, and it sends data to LLMs in this order: "General Prompts (How to write good code)" + "The Code" + "The Feature Request". This means the KV cache will be used even when the feature request changes.
And output tokens are usually way less than the input tokens.
So I think that my approach is very lightweight on token usage compared to an interactive session.
It would be interesting to measure it for the other agents out there. Sending a feature request two times vs an interactive session.
Asking because I could guess that approach would be ok for the types of front end work that doesn't require much security or other validation.
But it sounds like it wouldn't be suitable for work in regulated industries or anything that needs to have extreme care taken.
?