upvote
The cost is far from linear though. Because of prompt caching and the fact that generally output tokens are a lot more expensive than input tokens.
reply
Agreed that it is not linear.

I wrote my own agent, and it sends data to LLMs in this order: "General Prompts (How to write good code)" + "The Code" + "The Feature Request". This means the KV cache will be used even when the feature request changes.

And output tokens are usually way less than the input tokens.

So I think that my approach is very lightweight on token usage compared to an interactive session.

It would be interesting to measure it for the other agents out there. Sending a feature request two times vs an interactive session.

reply
"Make the button red" probably doesn't need an LLM at all.
reply
One tends to use LLMs for everything in practice. It‘s inconvenient to switch mode of operation
reply
That’s usually not true due to caching. It may be true if you leave a large gap in between, but if you send “make it red” right after, then it’s purely incremental
reply