upvote
I think options 1 and 4. I like the idea of 4. I was trying out one of these projects and indexes a codebase with AI to make asking questions about it easier. I ran the numbers and it was going to take 24 hours of crunching on my 7900xtx. I just gave up instead. Zero setup should include not needing to do that.

3 would be the hardest but most useful thing. The problem is that it's scattered around different computers and networks that don't talk to eachother. We could have a file in SharePoint on one system referencing a file on an SMB share on a completely different network. It's a big pain and very difficult to work with but it's not something I expect software running on my computer with access to a subset of the information to be able to solve.

reply
That’s a really important definition of “zero setup”: no long-running indexing jobs, and no “crunch for 24 hours on my laptop” just to make search usable.

And I hear you on cross-network fragmentation — in a lot of real environments the hardest part isn’t search quality, it’s that data lives on different machines, different networks, and you only have partial visibility at any given time.

If you had to pick, would you rather have:

1.instant local indexing over whatever is reachable right now (even if incomplete), or

2.a lightweight distributed approach that can index in-place on each machine/network and only share metadata/results across boundaries?

I’m exploring this “latency + partial visibility” constraint as a first-class requirement (more context in my HN profile/bio if you want to compare notes).

reply
To be very clear, there are networks that exist that do not share anything across the boundary. I'm maybe not your prime customer, but some people get very hung up on such things and we go in circles about feasibility. So in that 2 is an impossibility at times, I'd prefer 1.
reply
Indexing everything becomes unbounded fast. Shrink scope to one source of truth and a small curated corpus. Capture notes in one repeatable format, tag by task, and prune on a fixed cadence. That keeps retrieval predictable and keeps the model inside constraints.
reply
That’s another strong point, and I think it’s the pragmatic default: shrink scope, keep one source of truth, enforce a repeatable format, and prune on a cadence. It’s basically how you keep both retrieval and any automation predictable.

The tension I’m trying to understand is that in a lot of real setups the “corpus” isn’t voluntarily curated — it’s fragmented across machines/networks/tools, and the opportunity cost of “move everything into one place” is exactly why people fall back to grep and ad-hoc search.

Do you think the right answer is always “accept the constraint and curate harder”, or is there a middle ground where you can keep sources where they are but still get reliable re-entry (even if it’s incomplete/partial)?

I’m collecting constraints like this as the core design input (more context in my HN profile/bio if you want to compare notes).

reply