Exactly this, and this tool called qmd is what I use for the hybrid search portion. It also uses local LLMs to provide summaries on your own markdown data too. My agents use both depending on what type of search they are doing, and both provide good results.
That assumes that the agent knows which one is better. And to bake in which one is better via post-training would require a study like this to establish where each one works well
I’ve got a custom ultra high performance streaming semantic search I exposed as a tool and the RL bias in Claude is almost insurmountable without copious and consistent steering. Codex will follow instructions and use the tools I ask it to but for gods sake between Claude asking to take a nap because it’s getting late in the session and it regressing to RL biased tools like grep it’s maddening. When I can get it to use my compositional tools tool calls drop from like 20-50 to 3-4, but it’s almost impossible to steer.