The use case is well defined here, let’s not jump the gun. Text search, like with code, is a relatively simple problem compared to intrinsic semantic content in a book for example. I think the moral here is that RAG is not a silver bullet, the claude code team came to the same conclusion.
Modern OCR tooling is quite good. If the knowledge you are adding into your search database is able to be OCR'd then I think the approach we took here is able to be generalized.