We replaced RAG with a virtual filesystem for our AI documentation assistant

https://softwaredoug.com/blog/2026/01/08/semantic-search-wit...

[-]

The real thing I think people are rediscovering with file system based search is that there’s a type of semantic search that’s not embedding based retrieval. One that looks more like how a librarian organizes files into shelves based on the domain.

We’re rediscovering forms of in search we’ve known about for decades. And it turns out they’re more interpretable to agents.

by wielebny2 hours ago|

[-]

Someone simply assumed at some point that RAG must be based on vector search, and everyone followed.

by softwaredoug2 hours ago|

https://maven.com/p/7105dc/rag-is-the-what-agentic-search-is...

[-]

It’s something of a historical accident

We started with LLMs when everyone in search was building question answering systems. Those architectures look like the vector DB + chunking we associate with RAG.

Agents ability to call tools, using any retrieval backend, call that into question.

We really shouldn’t start RAG with the assumption we need that. I’ll be speaking about the subject in a few weeks

by TeMPOraL2 hours ago|

[-]

Right. R in RAG stands for retrieval, and for a brief moment initially, it meant just that: any kind of tool call that retrieves information based on query, whether that was web search, or RDBMS query, or grep call, or asking someone to look up an address in a phone book. Nothing in RAG implies vector search and text embeddings (beyond those in the LLM itself), yet somehow people married the acronym to one very particular implementation of the idea.

by oceansky1 hours ago|

[-]

I'm still using the old definition, never got the memo.

by adfm49 minutes ago|

[-]

That’s OK. Most got ReST wrong, too.

by rafterydj1 hours ago|

[-]

Stuck it on my calendar, looking forward to it.

by KPGv255 minutes ago|

[-]

You seem like someone who knows what they're doing, and I understand the theoretical underpinnings of LLMs (math background), but I have little kids that were born in 2016 and so the entire AI thing has left me in the dust. Never any time to even experiment.

I am active in fandoms and want to create a search where someone can ask "what was that fanfic where XYZ happened?" and get an answer back in the form of links to fanfiction that are responsive.

This is a RAG system, right? I understand I need an actual model (that's something like ollama), the thing that trawls the fanfiction archive and inserts whatever it's supposed to insert into one of these vector DBs, and I need a front-facing thing I write, that takes a user query, sends it to ollama, which can then search the vector DB and return results.

Or something like that.

Is it a RAG system that solves my use case? And if so, what software might I go about using to provide this service to me and my friends? I'm assuming it's pretty low in resource usage since it's just text indexing (maybe indexing new stuff once a week).

The goal is self-hosting. I don't wanna be making monthly payments indefinitely for some silly little thing I'm doing for me and my friends.

I am just a stay at home dad these days and don't have anyone to ask. I'm totally out the tech game for a few years now. I hope that you could respond (or someone else could), and maybe it will help other people.

There's just so many moving parts these days that I can't even hope to keep up. (It's been rather annoying to be totally unable to ride this tech wave the way I've done in the past; watching it all blow by me is disheartening).

by 9dev27 seconds ago|

[-]

In the definition of RAG discussed here, that means the workflow looks something like this (simplified for brevity): When you send your query to the server, it will first normalise the words, then convert them to vectors, or embeddings, using an embedding model (there are also plain stochastic mechanisms to do this, but today most people mean a purpose-built LLM). An embedding is essentially an array of numeric coordinates in a huge-dimensional space, so [1, 2.522, …, -0.119].

It can now use that to search a database of arbitrary documents with pre-generated embeddings of their own. This usually happens during inserting them to the database, and follows the same process as your search query above, so every record in the database has its own, discrete set of embeddings to be queried during searches.

The important part here is that you now don’t have to compare strings anymore (like looking for occurrences of the word "fanfiction" in the title and content), but instead you can perform arbitrary mathematical operations to compare query embeddings to stored embeddings: 1 is closer to 3 than 7, and in the same way, fanfiction is closer to romance than it is to biography. Now, if you rank documents by that proximity and take the top 10 or so, you end up with the documents most similar to your query, and thus the most relevant.

I hope this helps :-)

by johnathandos46 minutes ago|

[-]

I think the example you give is a little backwards — a RAG system searches for relevant content before sending anything to the LLM, and includes any content retrieved this way in the generative prompt. User query -> search -> results -> user query + search results passed in same context to LLM.

by ivanovm35 minutes ago|

[-]

I don't think this was a simple assumption. LLMs used to be much dumber! GPT-3 era LLMS were not good at grep, they were not that good at recovering from errors, and they were not good at making followup queries over multiple turns of search. Multiple breakthroughs in code generation, tool use, and reasoning had to happen on the model side to make vector-based RAG look like unnecessary complexity

by bluegatty1 hours ago|

[-]

It was the terminology that did that more than anything. The term 'RAG' just has a lot of consequential baggage. Unfortunately.

by morkalork2 hours ago|

[-]

Doesn't have to be tho, I've had great success letting an agent loose on an Apache Lucene instance. Turns out LLMs are great at building queries.

by czhu121 hours ago|

1: https://github.com/VectifyAI/PageIndex

[-]

Similar effort with PageIndex [1], which basically creates a table of contents like tree. Then an LLM traverses the tree to figure out which chunks are relevant for the context in the prompt.

by khalic2 hours ago|

[-]

This kind of circles back to ontological NLP, that was using knowledge representation as a primitive for language processing. There is _a ton_ of work in that direction.

by softwaredoug2 hours ago|

[-]

Exactly. And LLMs supervised by domain experts unlock a lot of capabilities to help with these types of knowledge organization problems.

by 2 hours ago|

[-]

deleted

[-]

I think it's cool that LLMs can effectively do this kind of categorization on the fly at relatively large scale. When you give the LLM tools beyond just "search", it really is effectively cheating.

by UltraSane2 hours ago|

[-]

Inverted indexes have the major advantages of supporting Boolean operators.

by whattheheckheck2 hours ago|

[-]

Turns out the millions of people in knowledge work arent librarians and they wing shit everywhere

by pwr17 minutes ago|

[-]

This mirrors something we ran into building an AI pipeline for audio content. The problem with traditional RAG is that chunking destroys the structure that actually matters — you end up retrieving fragments that are semantically similar but contextually useless.

The filesystem metaphor works because it preserves heirarchy. Documents have sections, sections have relationships, and those relationships carry meaning that gets lost when you flatten everything into embeddings.

Curious how this handles versioning though. Docs change constantly and stale context fed to an LLM is arguably worse than no context at all.

by sunir58 minutes ago|

[-]

I am really enjoying this renaissance in CLI world applications. There's so much possible.

I'm working on a related challenge which is mounting a virtual filesystem with FUSE that mirrors my Mac's actual filesystem (over a subtree like ~/source), so I can constrain the agents within that filesystem, and block destructive changes outside their repo.

I have it so every repo has its own long-lived agent. They do get excited and start changing other repos, which messes up memory.

I didn't want to create a system user per repo because that's obnoxious, so I created a single claude system user, and I am using the virtual file system to manage permissions. My gmail repo's agent can for instance change the gmail repo and the google_auth repo, but it can't change the rag repo.

Edit: I'm publishing it here. It's still under development. https://github.com/sunir/bashguard

by Galanwe2 hours ago|

[-]

I am not familiar with the tech stack they use, but from an outsider point of view, I was sort of expecting some kind of fuse solution. Could someone explain why they went through a fake shell? There has to be a reason.

[-]

100% agree a FUSE mount would be the way to go given more time and resources.

Putting Chroma behind a FUSE adapter was my initial thought when I was implementing this but it was way too slow.

I think we would also need to optimize grep even if we had a FUSE mount.

This was easier in our case, because we didn’t need a 100% POSIX compatibility for our read only docs use case because the agent used only a subset of bash commands anyway to traverse the docs. This also avoids any extra infra overhead or maintenance of EC2 nodes/sandboxes that the agent would have to use.

by readitalready52 minutes ago|

[-]

Yah my Claude Code agents run a ton of Python and bash scripts. You're probably missing out on a lot of tool use cases without full tool use through POSIX compatibility.

by Galanwe1 hours ago|

[-]

Makes sense, thanks for clarifying!

by seanlinehan2 hours ago|

[-]

This is definitely the way. There are good use cases for real sandboxes (if your agent is executing arbitrary code, you better it do so in an air-gapped environment).

But the idea of spinning up a whole VM to use unix IO primitives is way overkill. Makes way more sense to let the agent spit our unix-like tool calls and then use whatever your prod stack uses to do IO.

[-]

100% agree. However, if there were no resource tradeoffs, then a FUSE mount would probably be the way to go.

by tylergetsay1 hours ago|

[-]

I dont understand the additional complexity of mocking bash when they could just provide grep, ls, find, etc tools to the LLM

[-]

I agree that would have been the way to go given more time and resources. However, setting up a FUSE mount would have taken significantly longer and required additional infrastructure.

by wahnfrieden1 hours ago|

[-]

agents are trained on bash grep/ls/find, not on tool-calling grep/ls/find

by MeetRickAI10 minutes ago|

[-]

[dead]

by pboulos2 hours ago|

[-]

I think this is a great approach for a startup like Mintlify. I do have skepticism around how practical this would be in some of the “messier” organisations where RAG stands to add the most value. From personal experience, getting RAG to work well in places where the structure of the organisation and the information contained therein is far from hierarchical or partition-able is a very hard task.

by khalic2 hours ago|

[-]

The use case is well defined here, let’s not jump the gun. Text search, like with code, is a relatively simple problem compared to intrinsic semantic content in a book for example. I think the moral here is that RAG is not a silver bullet, the claude code team came to the same conclusion.

by pboulos57 minutes ago|

[-]

I agree with your assessment.

by GandalfHN42 minutes ago|

[-]

Layering a virtual FS over a spaghetti-doc org is an indexer in drag, and you still need access control or it's a complaince disaster.

[-]

Modern OCR tooling is quite good. If the knowledge you are adding into your search database is able to be OCR'd then I think the approach we took here is able to be generalized.

by jdthedisciple50 minutes ago|

[0] https://news.ycombinator.com/item?id=14550060

[-]

But SQLite is notoriously 35% faster than the filesystem [0], so why not use that?

by tomComb5 minutes ago|

[-]

And Turso has built a Virtual Filesystem on top of their SQLite.

AgentFS https://agentfs.ai/ https://github.com/tursodatabase/agentfs

Which sounds like a great idea, except that is uses NFS instead of FUSE (note that macFUSE now has a FSKit backend so FUSE seems like the best solution for both Mac and Linux).

by kenforthewin1 hours ago|

[-]

I don't get it - everybody in this thread is talking about the death of vector DBs and files being all you need. The article clearly states that this is a layer on top of their existing Chroma db.

by dominotw1 hours ago|

[-]

what value is chromadb adding in that setup

[-]

yea chromadb is not the point. multiple data storage solutions work

by kenforthewin56 minutes ago|

[-]

I see .. so you're not using the vectors at all. Where are the evaluations showing this chromaFS approach is performing better than vectors?

by bluegatty1 hours ago|

[-]

RAG should no have have been represented as a context tool but rather just vector querying ad an variation of search/query - and that's it.

We were bitten by our own nomenclature.

Just a small variation in chosen acronym ... may have wrought a different outcome.

Different ways to find context are welcome, we have a long way to go!

[-]

agreed!

by dmix2 hours ago|

[-]

This puts a lot of LLM in front of the information discovery. That would require far more sophisticated prompting and guardrails. I'd be curious to see how people architect an LLM->document approach with tool calling, rather than RAG->reranker->LLM. I'm also curious what the response times are like since it's more variable.

[-]

Hmmm, the post is an attempt to explain that Mintlify migrated from embedding-retrieval->reranker->LLM to an agent loop with access to call POSIX tools as it desires. Perhaps we didn't provide enough detail?

by dmix1 hours ago|

[-]

That matches what I'm curious about. Where an LLM is doing the bulk of information discovery and tool calling directly. Most simpler RAGs have an LLM on the frontend mostly just doing simpler query clean up, subqueries and taxonomy, then again later to rerank and parse the data. So I'd imagine the prompting and guardrails part is much more complicated in an agent loop approach, since it's more powerful and open ended.

by mandeepj2 hours ago|

[-]

> even a minimal setup (1 vCPU, 2 GiB RAM, 5-minute session lifetime) would put us north of $70,000 a year based on Daytona's per-second sandbox pricing ($0.0504/h per vCPU, $0.0162/h per GiB RAM)

$70k?

how about if we round off one zero? Give us $7000.

That number still seems to be very high.

by lstodd1 hours ago|

[-]

Hm. I think a dedicated 16-core box with 64 ram can be had for under $1000/year.

It being dedicated there are no limits on session lifetime and it'd run 16 those sessions no problem, so the real price should be around ~$70/year for that load.

by maille2 hours ago|

[-]

Let's say I want a free, local or free-tier-llm, simple solution to search information mostly from my emails and a little bit from text, doc and pdf files. Are there any tool I should try to have ollamma or gemini able to reply with my own knowledge base?

by ghywertelling1 hours ago|

[-]

https://onyx.app/

This could be useful.

by tschellenbach1 hours ago|

[-]

I think generally we are going from vector based search, to agentic tool use, and hierarchy based systems like skills.

by ghywertelling1 hours ago|

https://huggingface.co/docs/smolagents/en/examples/rag

[-]

Agents doing retrieval has been around for quite a while

Agentic RAG: A More Powerful Approach We can overcome these limitations by implementing an Agentic RAG system - essentially an agent equipped with retrieval capabilities. This approach transforms RAG from a rigid pipeline into an interactive, reasoning-driven process.

The innovation of the blogpost is in the retrieval step.

[-]

Vector search has moved from a "complete solution" to just one tool among many which you should likely provide to an agent.

by devops00038 minutes ago|

[-]

Why not a simple full text search in Postgres ?

by dust421 hours ago|

[-]

If grep and ls do the trick, then sure you don't need RAG/embeddings. But you also don't need an LLM: a full text search in a database will be a lot more performant, faster and use less resources.

by badgersnake26 minutes ago|

[-]

So you did GraphRAG but your graph is a filesystem tree.

by HanClinto1 hours ago|

[-]

> "The agent doesn't need a real filesystem; it just needs the illusion of one. Our documentation was already indexed, chunked, and stored in a Chroma database to power our search, so we built ChromaFs: a virtual filesystem that intercepts UNIX commands and translates them into queries against that same database. Session creation dropped from ~46 seconds to ~100 milliseconds, and since ChromaFs reuses infrastructure we already pay for, the marginal per-conversation compute cost is zero."

Not to be "that guy" [0], but (especially for users who aren't already in ChromaDB) -- how would this be different for us from using a RAM disk?

> "ChromaFs is built on just-bash ... a TypeScript reimplementation of bash that supports grep, cat, ls, find, and cd. just-bash exposes a pluggable IFileSystem interface, so it handles all the parsing, piping, and flag logic while ChromaFs translates every underlying filesystem call into a Chroma query."

It sounds like the expected use-case is that agents would interact with the data via standard CLI tools (grep, cat, ls, find, etc), and there is nothing Chroma-specific in the final implementation (? Do I have that right?).

The author compares the speeds against the Chroma implementation vs. a physical HDD, but I wonder how the benchmark would compare against a Ramdisk with the same information / queries?

I'm very willing to believe that Chroma would still be faster / better for X/Y/Z reason, but I would be interested in seeing it compared, since for many people who already have their data in a hierarchical tree view, I bet there could be some massive speedups by mounting the memory directories in RAM instead of HDD.

[0] - https://news.ycombinator.com/item?id=9224

[-]

We would also be super interested to see that comparison. I agree that there isn't a specific reason why Chroma would be required to build something like this.

by yieldcrv35 minutes ago|

[-]

I love the multipronged attack on RAG

RIP RAG: lasted one year at a skillset that recruiters would list on job descriptions, collectively shut down by industry professionals

by jrm41 hours ago|

[-]

Is this related to that thing where somehow the entire damn world forgot about the power of boolean (and other precise) searching?

by ctxc1 hours ago|