This is exactly what I am trying to solve and I have what I call smart repositories that demonstrates this at
https://github.com/gitsense/smart-ripgrep
https://github.com/gitsense/smart-codex
The issue I am finding is, getting the agent to pull what it needs, even when the data is there is still challenging since LLMs are trained on blind discovery where the pattern is:
grep -> read -> grep -> read ...
What is working for me now is thanks to Pi (pi.dev). I am working on a pi-brains extension that makes it dead simple to control the lifecycle for an agent so if I detect that it uses `rg` without `gsc rg`, I can block the agent and inject a steering message that says always search with context.
I can also see if they try to "read" without first looking at the files metadata and so forth.
I'm finalizing things right now, but I think pi with my brains extension should allow domain experts to better guide agents so they can find what they need, when they need it.
But your LLM training corpus covered that, right? /i