undefined

points

[-]

The cost is far from linear though. Because of prompt caching and the fact that generally output tokens are a lot more expensive than input tokens.

by mg10 hours ago|

parent|

[-]

Agreed that it is not linear.

I wrote my own agent, and it sends data to LLMs in this order: "General Prompts (How to write good code)" + "The Code" + "The Feature Request". This means the KV cache will be used even when the feature request changes.

And output tokens are usually way less than the input tokens.

So I think that my approach is very lightweight on token usage compared to an interactive session.

It would be interesting to measure it for the other agents out there. Sending a feature request two times vs an interactive session.

by ryan_glass11 hours ago|

prev|

[-]

"Make the button red" probably doesn't need an LLM at all.

by Tepix9 hours ago|

parent|

[-]

One tends to use LLMs for everything in practice. It‘s inconvenient to switch mode of operation

by Chirono11 hours ago|

prev|

[-]

That’s usually not true due to caching. It may be true if you leave a large gap in between, but if you send “make it red” right after, then it’s purely incremental