upvote
If input tokens dominate the cost to that extent, this implies that major gains are possible by making better use of caching. You could basically ask the model to do a one-time "compaction" step including a dump of the relevant portions of the code, and use that as the cached prefix for a large amount of "swarm" subagent calls.
reply
Did you experiment with giving agent better tools to navigate and document the codebase? Asts, language servers and so on?

A million tokens (not cached) sounds like a lot.

reply
The target codebase is very large. A million tokens is a drop in the proverbial bucket.

I still don't understand how caching helps me very much. I must be misunderstanding it because I thought the user's prompt (which is the biggest variable) necessarily sits prior to all of these token intensive tool calls. How can we cache the reading of codebase if the prefix is always moving?

reply
If an agent makes a tool call, the LLM provider will receive the full context again after the result of the tool call becomes available in order to decide the next move. Everything up to the point of the tool call being made will no longer change and could thus in theory be cached. If the agent makes a ton of tool calls, then for every tool call one should be hitting the cache an equal amount of times.

A new instruction by the user will be appended at the end if it done in the same conversation. Thus only has influence on the cacheability of the original agent prompt, but not of subsequent tool calls.

reply
Often to me it seams like using MA is like letting a million monkeys lose.

Has ai forgotten about high level design? Surely all it needs to know is what the methods, objects or functions in the code base actually does and the actual code it is meant to be fixing?

I wonder if half the issues is that the LLM try to change too much?

reply
> The target codebase is very large.

But, does every prompt need the entire codebase?

reply
How could it not? Can you ever guarantee accurate answers about a book you haven't entirely read?
reply