undefined

points

[-]

Options (a) and (b) add more bloat to the model’s context window and option (c) seem to reduce to having similar functions that already existed. There is also the option to trick the LLM that it’s using the old function exactly as-is, while the harness abstracts away a completely different methodology. Cursor often does exactly this: they use an internally built vectorized search when the model calls the default “find” bash command. The LLM is none the wiser that the function’s implementation is completely different.

Regardless, in any of these cases, the implementation for any of these above options may be vastly superior to the “naive” implementation for agents — but then the parent comment here is right that an engineer would need to justify their implementation to users, not just make a loud conjecture. It’s a non-trivial claim to say that a bespoke solution not present in tool-use training and accounting for context-rot would result in a better performing model. Moreover, justifying an agent-specific efficiency gain that humans wouldn’t benefit from makes the claim even more non-trivial. Using Sagan’s razor, it’s then reasonable for people to ask for a comparably non-trivial amount of evidence.