3 would be the hardest but most useful thing. The problem is that it's scattered around different computers and networks that don't talk to eachother. We could have a file in SharePoint on one system referencing a file on an SMB share on a completely different network. It's a big pain and very difficult to work with but it's not something I expect software running on my computer with access to a subset of the information to be able to solve.
And I hear you on cross-network fragmentation — in a lot of real environments the hardest part isn’t search quality, it’s that data lives on different machines, different networks, and you only have partial visibility at any given time.
If you had to pick, would you rather have:
1.instant local indexing over whatever is reachable right now (even if incomplete), or
2.a lightweight distributed approach that can index in-place on each machine/network and only share metadata/results across boundaries?
I’m exploring this “latency + partial visibility” constraint as a first-class requirement (more context in my HN profile/bio if you want to compare notes).
The tension I’m trying to understand is that in a lot of real setups the “corpus” isn’t voluntarily curated — it’s fragmented across machines/networks/tools, and the opportunity cost of “move everything into one place” is exactly why people fall back to grep and ad-hoc search.
Do you think the right answer is always “accept the constraint and curate harder”, or is there a middle ground where you can keep sources where they are but still get reliable re-entry (even if it’s incomplete/partial)?
I’m collecting constraints like this as the core design input (more context in my HN profile/bio if you want to compare notes).