A lot of enterprises use Github Copilot which has per-request pricing model which effectively means unlimited tokens which eliminates this issue.
The harness receives a response, has to parse out the tool call, execute it and then start a new request with the tool call result.
Nope, not unless you are doing steering.
Each new prompt = new request, but tool calls don't count.