I still don't understand how caching helps me very much. I must be misunderstanding it because I thought the user's prompt (which is the biggest variable) necessarily sits prior to all of these token intensive tool calls. How can we cache the reading of codebase if the prefix is always moving?
A new instruction by the user will be appended at the end if it done in the same conversation. Thus only has influence on the cacheability of the original agent prompt, but not of subsequent tool calls.
Has ai forgotten about high level design? Surely all it needs to know is what the methods, objects or functions in the code base actually does and the actual code it is meant to be fixing?
I wonder if half the issues is that the LLM try to change too much?
But, does every prompt need the entire codebase?