Filesystems are having a moment

(madalitso.me)

161 points

by malgamves12 hours ago |

99 comments

by staplung3 hours ago|

[-]

Not knocking the article in any way but from the headline I was expecting - perhaps hoping - this would be about some innovation in filesystems research like it was the 90's again. That's not what this is.

It's about how filesystems as they are (and have been for decades) are proving to be powerful tools for LLMs/agents.

by alecco2 hours ago|

parent|

[-]

And by filesystem they mean CLI (command line interface) and a full *nix system. Like the hundreds of similar articles about it for the past year said.

by Gigachad1 hours ago|

parent|

prev|

[-]

I feel like every article on HN now disguises itself as interesting but the content is just the same boring AI slop.

by palata48 minutes ago|

parent|

[-]

I have been reading HN for a few years, and my feeling is that I find fewer and fewer interesting articles. Maybe it's just me, and the average articles are the same quality.

Now I tend to skim through it to see if a title looks like it may bring interesting discussions, and then I skim through the discussions. Because there are very knowledgeable people who sometimes share valuable insights.

Interestingly, last time I asked a question, hoping to get interesting people to share insights, I was answered that I "should learn how to use an LLM instead of asking questions" :-).

by lozenge16 minutes ago|

parent|

prev|

[-]

I don't think the contents is AI sklop at all. It just haooens to be abiut AI.

by XorNot6 minutes ago|

parent|

[-]

At this point those are about the same in terms of quality.

"I spent money writing "a prompt* and this is what I learned".

by fragmede2 hours ago|

parent|

prev|

[-]

Yeah, none of it was really about file systems. There was a brief mention that file systems look like a graph, and that you build roughly an index so it looks graph and thus database-y, but you could store it all in a sqlite database with a column, called filename and a column called content for all the details about file systems this post went into. I too was expecting something more in depth about file systems like for instance, cluster file systems have made a little to no advancement. ZFS is not a cluster file system and we've been needing a good one of those for decades, ever since VM's became feasible on consumer grade hardware. Still, files on desk is better than having to pay Oracle a fee per-skill on today's modern, open Internet. That was never going to happen.

by mangogogo3 hours ago|

parent|

prev|

[-]

i was hoping the same, but then it turned out to be another article about LLMs.

by tacitusarc7 hours ago|

prev|

[-]

Does everyone just use AI to write these days? Or is the style so infectious that I just see it everywhere? I swear there needs to be some convention around labeling a post with how much AI was used in its creation.

by heavyset_go5 hours ago|

parent|

[-]

I'd be embarassed to put my name on AI prose without a disclaimer and I'd also be annoyed to read it as a reader.

IMO it's insulting to the audience, it says your time and attention is not worthy of the author's own time and attention spent putting their own thoughts in their own words.

If you're going to do that at least mention it's LLM output or just give me your outline prompts. I don't care what your LLM has to say, I'm capable of prompting your outline in my own model myself if I feel like it.

by josephg2 hours ago|

parent|

[-]

> If you're going to do that at least mention it's LLM output

Yes, this! Please label AI generated content. Pull request written by an AI? Label it as ai generated. Blog post? Article generated with AI? Say so! It’s ok to use AI models. Especially if English is your second language. But put a disclaimer in. Don’t make the reader guess.

Eg:

> This content was partially generated by chatgpt

> Blog post text written entirely by human hand, code examples by Claude code

by fragmede1 hours ago|

parent|

prev|

[-]

Have any outlines you'd care to share?

by coliveira4 hours ago|

parent|

prev|

[-]

I'm not a fan of AI and try to avoid it, but there is a difference from AI output published by someone knowledgeable and any other AI output that you run by yourself. If an expert looked at the result and found it to be ok, then you can have some assurance that it at least makes sense. Your own AI run doesn't mean anything, it could be 100% hallucination and a non-expert will buy it as truth.

by Joel_Mckay4 hours ago|

parent|

[-]

Unfortunately, LLM slop now makes up >53% of the web, and is growing.

It is easy to spot the compacted token distribution unique to each model, but search engines still seem to promote nonsense content. =3

"Bad Bot Problem - Computerphile"

https://www.youtube.com/watch?v=AjQNDCYL5Rg

"A Day in the Life of an Ensh*ttificator "

https://www.youtube.com/watch?v=T4Upf_B9RLQ

by sethev6 hours ago|

parent|

prev|

[-]

LLMs were trained on stuff that people wrote. I get there are "tells", but don't really think people are as good at identifying AI generated text as they think they are...

by afro884 hours ago|

parent|

[-]

I wouldn't have picked this article as AI until I got an agent to do some writing for me and read a bunch of it to figure out if I can stand behind it. Now I see the tells everywhere "It's not this. It's that." is particularly common and I can't unsee it. (FWIW I rewrote most of the writing it generated, but it did help me figure out my structure and narrative)

The problem I think with AI generated posts is that you feel like you can't trust the content once it's AI. It could be partly hallucinated, or misrepresented.

by sethev1 hours ago|

parent|

[-]

Yeah, but "it's not X. It's Y" is a common idiom that LLMs picked up from people. That's the point i was making. And it's starting to feel like every post has at least one comment claiming that it was AI generated.

by antonvs4 hours ago|

parent|

prev|

[-]

Good chunks of the article don't trigger this for me, but I would bet money on the final paragraph involving AI:

> That's not a technical argument. It's a values argument. And it's one that the filesystem, for all its age and simplicity, is uniquely positioned to serve. Not because it's the best technology. But because it's the one technology that already belongs to you.

by adi_kurian1 hours ago|

parent|

prev|

[-]

Contractions

by computably2 hours ago|

parent|

prev|

[-]

You don't have to be good at identifying AI generated text to detect low-effort slop.

by malgamves4 hours ago|

parent|

prev|

[-]

As the author I can assure you there’s a human behind these words. Interesting times me live in though, I find myself questioning what’s AI and what’s not often too and at the moment we’ve offloaded that responsibility to the good will of authors or platform policy which might have to change soon

by meindnoch3 hours ago|

parent|

[-]

"there’s a human behind these words"

That's a bit vague. Was the article written without the aid of LLMs? Yes or no.

by torginus3 hours ago|

parent|

[-]

Well, if the words were 100% hand-written, I assume he'd have said that.

by green-salt1 hours ago|

parent|

prev|

[-]

Nice dodge! Unfortunately, this made it more obvious.

by jonmagic37 minutes ago|

parent|

prev|

[-]

I thought it was a great post tying a lot of things I’ve been reading and thinking about together. Could care less if you used AI if it helps my brain expand and or make connections I wouldn’t have otherwise.

by lovecg3 hours ago|

parent|

prev|

[-]

As in, you used 0 AI to write or edit this text? Or some AI? I’d like to calibrate myself.

by grey-area2 hours ago|

parent|

[-]

We all know the answer to that.

by 4 hours ago|

parent|

prev|

[-]

deleted

by q3k7 hours ago|

parent|

prev|

[-]

Everyone's trying to be the new thought leader enlightened technical essayist. So much fluff everywhere.

by orsorna6 hours ago|

parent|

[-]

What's wild is that with a few minutes of manual editing it would give exponential return. For instance, a lead sentence in your section saying "here's why X" that was already described by your subheading is unnecessary and could have been wholly removed.

by amarant5 hours ago|

parent|

[-]

Exponential return? This is the front page of HN! What does exponential returns even look like?

Are you saying this post is a few edits away from becoming a New York Times bestseller?

by orsorna4 hours ago|

parent|

[-]

No, I guess I meant editing to approach a text that doesn't look rushed over (LLM generation is a subset of such poor writings)

But you're right, it did hit the front page, and that says more about my sensibilities not lining up with whoever is voting the article up.

by gzread6 hours ago|

parent|

prev|

[-]

You'd have to have a good idea of how you want the document to read, which is half (or more) of the process of writing it.

by antonvs5 hours ago|

parent|

prev|

[-]

IME many people aren't very capable of editing their own work effectively. It's why "editor" exists as a profession.

by idiotsecant6 hours ago|

parent|

prev|

[-]

This doesn't seem particularly AI slopped to me.

by einr3 hours ago|

parent|

[-]

"Not bigger than databases. Different from databases.

It's not a website you go to — it's a little spirit that lives on your machine.

Not a chatbot. A tool that reads and writes files on your filesystem.

That's not a technical argument. It's a values argument."

by goodmythical5 hours ago|

parent|

prev|

[-]

Does everyone just complain about people using the tools they like to use these days? Or is the style so infectious that I just see it everywhere? I swear there needs to be some convention around labeling a post with how much whining was used in its creation.

by panarky3 hours ago|

parent|

prev|

[-]

Does everyone just easily accuse genuine, literate humans of "cheating" with AI when there's no way they could know that?

There are a lot of unique aspects of the writing in this post that LLMs don't typically generate on their own.

And there's not a "delve" or "tapestry" or even a bullet point to be found.

Also, accusations and complaints like this are off-topic and uninteresting.

We should be talking about filesystems here, not your gut instinct AI detector that has a sky-high false-positive rate.

I swear there needs to be some convention around throwing wild accusations at people you don't know based exclusively on vibes and with zero actual evidence.

by korbatz7 hours ago|

prev|

[-]

I was having exact same observation, albeit from a bit diffrent perspective: SaaS. This is where as the code tends to be temporary and very domain specific, the data (files) must strive to be boring standards.

The problem today is that we build specific, short-lived apps that lock data into formats only they can read. If you don't use universal formats, your system is fragile. We can still open JPEGs from 1995 because the files don't depend on the software used to make them. Using obscure or proprietary formats is just technical debt that will eventually kill your project. File or forget.

by jmathai6 hours ago|

parent|

[-]

My 10+ year old photo management system [1] relies on the file system and EXIF as the source of truth for my entire photo library.

It’s proven several times over that it’s the correct approach. Abstractions (formerly Google photos, currently Immich) should just be built on top - but these proprietary databases are only for convenience.

For work, I’m having the same experience as the author and everything is just markdown and csv files for Claude Code (for research and document writing).

[1] https://github.com/jmathai/elodie

by whartung4 hours ago|

parent|

[-]

I know some systems leverage the modern file meta data (extended attributes), but it's clearly not successful enough that folks can use them for an application like this.

Ostensibly, things like MacOS Spotlight can bring real utility and value to the file system, and extended attributes through the sidecar indexing, etc. But Spotlight is infamous for its unreliability.

The other issue with file systems is simply that the user (potentially) has "direct access" to them, in that they can readily move files in and up and around whimsically. The "structure" is laid bare for them to potentially interfere with, or, such as the case with the extended attributes, drag a file to a USB fob, and then copy it back -- inadvertently removing those attributes.

And thats how we end up with everything being stuffed into a SQLite DB.

by zenoprax4 hours ago|

parent|

prev|

[-]

I have your repo starred from a post/comment you made a few weeks ago but haven't had time to actually use/integrate it with my own stuff.

What are your thoughts on XMP sidecar files? I'm torn right now between digital negative + external metadata versus all-in-one image with mutable properties. Portability vs. Durability etc.

by jmathai36 minutes ago|

parent|

[-]

I've avoided using XMP sidecars. Mostly because I don't want to have to worry about two files for every photo. And I don't think they're ubiquitously supported like EXIF.

Thanks for starring the repo and let me know if you need any help.

by alanbernstein4 hours ago|

parent|

prev|

[-]

Thanks for sharing, I might have too much NIH syndrome to use it but I'd love to check it out.

by jmathai36 minutes ago|

parent|

[-]

Ha! I totally get it. Use it for inspiration though!

by Gigachad36 minutes ago|

parent|

prev|

[-]

The frustrating thing about photo management these days is how every major photo library app/cloud service these days stores every edit / tag / album externally. If you crop a photo, change the taken at date, etc, the original file never gets touched but an external bit of metadata is created. So any time you move platform, all of these edits and your albums are erased.

It is convenient to be able to undo crops or filters, but I wish the industry would standardize so these changes are portable.

by 32 minutes ago|

parent|

[-]

deleted

by hmokiguess5 hours ago|

prev|

[-]

Notable mention: Plan 9 from Bell Labs.

https://en.wikipedia.org/wiki/Plan_9_from_Bell_Labs

by mieubrisse3 hours ago|

parent|

[-]

I'm building an agent orchestrator (plug: https://github.com/mieubrisse/agenc) and asked Claude what prior art exists.

It pulled back Plan 9, and I was shocked: this is exactly what we need today, as I'm convinced we need to think about minimizing agent permissions the exact same way companies do. Plan 9 was just too early.

by packetlost3 hours ago|

prev|

[-]

We once again discover that Plan9 and UNIX were right. The most powerful, lowest common denominator interface is text files exposed over a file system. Now to get back to making 9p2026.

The article gets some fundamentals completely wrong though: file systems are full graphs, not strict trees and are definitely not acyclic

by andai29 minutes ago|

parent|

[-]

So what are Plan 9's killer features, and can they be bolted on with FUSE or is there a deeper magic at play?

by packetlost15 minutes ago|

parent|

[-]

Plan9 doesn't really have a single killer feature beyond 9P and the universal consistency and simplicity of its APIs. It has a very clean syscall interface and takes "everything is a file" to its logical conclusion and does it well (IMO). Pretty much everything is a file(system) and it's all accessed via the 9P protocol.

You could sorta bolt these features on with FUSE, but to see real benefits you'd want something closer to Inferno, which is like an OS/application runtime that runs on top of another OS host.

In my mind, the security model is the closest thing to a killer feature it has. Because everything is a file(system) and the fork/rfork and bind syscalls let you precisely control what resources/files/services/etc. a child process has access to via easily understandable shell commands (or using libc functions if you want), it means you don't need special APIs for namespacing (ie. containers) and access controls. It's very clean. When a parent process forks or spawns a child process, it can chose whether that process inherits the namespace or gets a clean slate that it can then bind filesystems onto, controlling precisely what it has access to.

by tasuki13 minutes ago|

prev|

[-]

> He pointed out that Claude Code works because it runs on your computer, with your environment, your data, your context.

Ah yes - I hate that. Yes it "works", but I don't want things to only work on my machine: I want them to work everywhere.

I was wondering why Google's Jules wasn't more popular, and I guess this is why. My preference for my code to work in different environments is unusual.

by largbae2 hours ago|

prev|

[-]

I think this article just speaks to the immaturity of our use of AI at this "moment."

Production grade systems might be written by agents running on filesystem skills, but the production systems themselves will run on consistent and scalable data structures.

Meanwhile the UI of AI agents will almost certainly evolve away from desktop computers and toward audio/visual interfaces. An agent might get more context from a zoom call with you, once tone and body language can be used to increase the bandwidth between you.

by andai27 minutes ago|

parent|

[-]

https://www.youtube.com/watch?v=GH9-EmgtABw

Saw this video recently, by an AI company working to get contextual cues from tone and body language. I think they're converting it to text and feeding it into a LLM, so not natively multimodal, but I still thought it was really cool.

by fragmede51 minutes ago|

parent|

prev|

[-]

I don't think written prompting will ever go away. Writing helps you organize your thoughts in a way that speaking, umm, ah, wait no, hang on, does not. Writing I can go back and change what I've already written before I hit send. Anybody who's prompted with speech for any length has been "wait no nevermind start over". So STT will get better, sure, it's already quite good. I just don't see text extry entirely going away because Human Intelligence (HI) just doesn't work in a way that speech would be the only interface.

by MarkMarine4 hours ago|

prev|

[-]

Over a number of files similar to a codebase, that are well organized (like a codebase) the coding agents and harnesses are quite good at finding information, they clearly train on them so they will only improve.

The challenge is how to structure messy data as a filesystem the agent can use. That is a lot harder than querying a vector db for a semantic query.

The code bases we’ve been using agents in had been pruned and maintained over years, we’ve got principles like DRY that pushed us to put the answer in one place… implicitly building and maintaining that graph with all the actors in the system invested in maintaining this. This is not the case for messy data, so while I see the authors point and agree that a filesystem is a better structure for context over time, we haven’t supplanted search yet for non-code data.

by dzello6 hours ago|

prev|

[-]

Resonates deeply with me. I’ve moved personal data out of ~10 SaaS systems into a single directory structure in the last year. Agents pay a higher price for fragmentation than humans. A well-organized system of files eliminates that fragmentation. It’s enough for single player. I suspect we’ll see new databases emerge that enable low multi-player (safe writes etc) scenarios without making the filesystem data more opaque. Not unlike what QMD is for search.

by 4 hours ago|

prev|

[-]

deleted

by JoeAltmaier3 hours ago|

prev|

[-]

Digression: a file system is a terrible abstraction. The ceremonial file tree, where branches are directories and you have to hang your file on a particular branch like a Christmas ornament.

Relational is better. Hell, and kind of unique identifier would be nice. So many better ways to organize data stores.

by zarzavat2 hours ago|

parent|

[-]

Filesystems have a property that changes preserve locality. A change made to one branch of the tree doesn't affect other branches (except for links). Databases lack this property: any UPDATE or DELETE can potentially affect any row depending on the condition. This makes them powerful but also scary. I don't want that every time I delete a file it potentially does a rm -rf / if I mistype the query.

The best compromise is what modern OSs have: a tree-like structure to store files but a database index on top for queries.

by JoeAltmaier2 hours ago|

parent|

[-]

You can create the tree structure from a relation. Not a primitive data store operation at all. Just add the attribute: parent directory and voila.

So often we want to look up 'the last file I printed' or 'that message I got from Bob'. Instead of just creating that lookup, we have to go spelunking.

Hell, every major app creates it's own abstractions because the OS/Filesystem doesn't have anything useful. Email systems organize messages and tags; document editors have collections of document aspects they store in a structured blob. Instead of asking the OS to do that.

by p_ing2 hours ago|

parent|

prev|

[-]

NTFS has a database, the MFT. It can index attributes, such as file names, which are a b+tree. A file's $DATA is also placed into the MFT, unless it doesn't fit, then NTFS allocates virtual cluster numbers (more MFT attributes) which point to the on-disk data structure of the file.

All files are represented in a table with rows and columns. "Directories" simply have a special "directory = true" attribute in a row (simplified).

The hierarchy is for you, the human.

Like many file systems, NTFS also contains a log for recoverability/rollback purposes.

It's not quite relational but it doesn't make sense to be relational. Why would you need more than one 'table' to contain everything you need to know about a file? Microsoft experimented with WinFS, which wasn't a traditional file system (it was an MSSQL database with BLOB storage which sat ontop of a regular NTFS volume). Performance was bad and Skydrive replaced the need for it (in the view of MSFT).

by dist-epoch1 hours ago|

parent|

[-]

The newest Microsoft filesystem, ReFS, remove the MFT. Because it created a lot of problems.

by p_ing1 hours ago|

parent|

[-]

> Because it created a lot of problems.

Please elaborate.

NTFS is still the better choice for common desktop usage. ReFS goals are centered around data integrity but it comes at the cost of performance.

by packetlost3 hours ago|

parent|

prev|

[-]

Files in most file systems are uniquely identified by inode and can be referenced by multiple files. Why does everyone forget links?

by JoeAltmaier50 minutes ago|

parent|

[-]

A dataset can persist across multiple file systems. A UUID is a way to know that one dataset is equivalent (identical) to another. Now you can cache, store-and-forward, archive and retrieve and know what you have.

by packetlost46 minutes ago|

parent|

[-]

UUIDs aren't very good for this use case, a sufficiently large CRC or cryptographic hash is better because it's intrinsically tied to the data's value while UUIDs are not

by mieubrisse3 hours ago|

parent|

prev|

[-]

I've been wondering this too: for us, UUIDs are super opaque. But for an agent, two UUIDs are distinct as day and night. Is the best filesystem just blob storage S3 style with good indexes, and a bit of context on where everything lives?

One thing directories solve: they're great grouping mechanisms. "All the Q3 stuff lives in this directory"

I bet we move towards a world where files are just UUIDs, then directory structures get created on demand, like tags.

by para_parolu3 hours ago|

parent|

[-]

Filepath is just unique name that model can identify easily and understand grouping. Uuid solves nothing but requires another mapping from file to short description.

by JoeAltmaier2 hours ago|

parent|

[-]

UUID solve oh so very, very much.

You can have several versions of the same set of data object at once - an entire source set for a build, all the names duplicate but tagged with 'revision' so they can be distinguished.

Hard to do that without a UUID at root, to use for unique identification of the particular 'particle' of the particular data set.

by JoeAltmaier2 hours ago|

parent|

prev|

[-]

Or, have to "Q" attribute and ask the file store for "Q=3"

All good.

by _pdp_1 hours ago|

prev|

[-]

In other words, file systems are an excellent way to organise information. I mean, yeah - we've been using them forever.

File systems are not a good abstraction mechanism for remote procedure calls, though. I think it's important to distinguish between the two, since I find there are a lot of articles conflating both - comparing MCPs to SKILLs, which are completely different things.

I think the confusion comes from the fact that MCP came before SKILLs, and there's a mental model where SKILLs are somehow "better than" MCPs. This is like saying local Word documents are better than a fully integrated collaborative office suite. It's just not the same thing.

The reason SKILLs work so well is because there's 50 years of accumulated knowledge of how to run rudimentary Unix tools.

the TLDR

File systems - organising information MCP/APIs - remote procedure calls

by leonflexo5 hours ago|

prev|

[-]

I wonder how much of a lost in the middle effect there is and if there could be or are tools that specifically differentiate optimizing post compaction "seeding". One problem I've run into with open spec is after a compaction, or kicking off a new session, it is easy to start out already ~50k tokens in and I assume somewhat more vulnerable to lost in the middle type effects before any actual coding may have taken place.

by ramoz6 hours ago|

prev|

[-]

I thing the real impact behind the scenes here is Bash(). Filesystem relevance is a bit coincidental to placing an agent on an operating system and giving it full capability over it.

by zmmmmm1 hours ago|

prev|

[-]

I don't think there's a lot magical about files beyond (a) they are native for LLMs and coding because they both process text and (b)when things are rapidly in flux, unstructured formats prosper because flexibility is king. Literally any fixed format you try and describe becomes rapidly outdated and fails to serve the purpose. For example it feels like MCP is already ageing like milk.

Which is mainly to say, trust me, this is a temporary state, the god of complexity is coming. It is utterly inevitable. The people who created React, Kubernetes, all those Java frameworks you hated etc didn't go away. They are right now thinking about how amazing it would be if you if you stacked ten different tools together with brand new structured file formats and databases. We already have "beads" and "gastown" where this is starting. Enjoy these times because a couple of years from now it will already be the end of the "fun" part I think.

by stephbook1 hours ago|

prev|

[-]

I'm not too deep into agentic coding, but I hadn't understood why people write `SOUL.md` files like no tomorrow. Does anyone think these will be called the same three years from now?

If you've got a coding convention, enforce it using a linter. Have the LLM write the rules and integrate it into the local build and CI tool.

Has noone ever thought about how – gasp – a future human collaborator would be onboarded?

by 3 hours ago|

prev|

[-]

deleted

by 0xbadcafebee4 hours ago|

prev|

[-]

Can we bring back Plan9 architecture now? It had what was essentially MCP. You make a custom device driver, and anything really can be a file. Not only that, but you network them, so a file on local disk could be a display on a remote host (or whatever). Just tell the agent to read/write files and it doesn't need to figure out either MCP or tool calls.

by bnjms3 hours ago|

parent|

[-]

This seems like the place to ask. What other big ideas have there been since everything-is-a-file? I’m not aware of any. And it seems like we want another layer of permissions on device & data access we spent have before.

by jmclnx8 hours ago|

prev|

[-]

Funny, decades ago (mid-80s), I had to write a onetime fix on a what would be now a very low memory system, the data in question had a unique key of 8 7bit-ascii characters.

Instead of reading multi-meg data into memory to determine what to do, I used the file system and the program would store data related to the key in sub directories instead. The older people saw what I did and thought that was interesting. With development time factored in, doing it this way ended up being much faster and avoided memory issues that would have occurred.

So with AI, back to the old ways I guess :)

by bsenftner6 hours ago|

parent|

[-]

Reminds me of early data driving approaches. Early CD based game consoles had memory constraints, which I sidestepped by writing the most ridiculous simple game engine: the game loop was all data driven, and "going somewhere new" in the game was simply triggering a disc read given a raw sector offset and the number of sectors. That read was then a repeated series of bytes to be written at the memory address given by the first 4 bytes read and next 4 bytes how many bytes to copy. That simple mechanism, paired with a data organizer for creating the disc images, enabled some well known successful games to have "huge worlds" with an executable under 100K, leaving the rest of the console's memory for content assets, animations, whatever.

by alexjplant5 hours ago|

parent|

[-]

Which games were these out of interest? I enjoy reading about game dev from the nascent era of 3D on home consoles (on the Saturn in particular) and would love to hear more.

by bsenftner5 hours ago|

parent|

[-]

Tiger Woods Golf PSX was one, RoadRash3D0 another. Dozens that were never popular too.

by 12 hours ago|

prev|

[-]

deleted

by TacticalCoder7 hours ago|

prev|

[-]

As TFA basically says: files on a filesystem is a DB. Just a very crude one. There aren't nice indexes for a variety of things. "Views" are not really there (arguably you can create different views with links but it's, once again, very crude). But it's definitely a DB, represented as a tree indeed as TFA mentions.

My life's data, including all the official stuff (bank statements, notary acts, statements made to the police [witness, etc.], insurance, property titels), all my coding projects, all the family pictures (not just the ones I took) and all the stuff I forgot, is in files, not in a dedicated DB. But these files are a definitely a database.

And because I don't want to deal with data corruption and even less want to deal with synching now corrupted data, many of my files contains, in their filename, a partial cryptographic checksum. E.g. "dsc239879879.jpg" becomes "dsc239789879-b3-6f338201b7.jpg" (meaning the Blake3 hash of that file has to begin with 6f338201b7 or the file is corrupted).

At any time, if I want to, I can import these in "real" dedicated DBs. For example I can pass my pictures as a read-only to "I'm Mich" (immich) and then query my pictures: "Find me all the pictures of Eliza" or "Find me all the pictures taken in 2016 on the french riviera".

But the real database of my all my life is and shall always be files on a filesystem.

With a "real" database, a backup can be as simple as a dump. With files backuping involve... Making sure you keep a proper version of all your files.

I'd say files are even more important than the filesystem: a backup on a BluRay disc or on an ext4-formatted SSD or on an exfat formatted SSD or on a tape... Doesn't matter: the files are the data.

A filesystem is the first "database" with these data: a crude one, with only simple queries. But a filesystem is definitely a database.

The main advantage of this very simple database is that as long as the data are accessible, you know your data is safe and can always use them to populate more advanced databases if needed.

by euroderf4 hours ago|

parent|

[-]

It's not "crude" if you get hierarchical organization without having to screw around with RECURSIVE, or "closure this" and "closure that". It just works.

by rzerowan4 hours ago|

parent|

prev|

[-]

Were it more portable BeOS/Haiku's BeFS would have been a perrfect fit in this instance.Seeing that it is a filesystem thah has database properties via extended attributes[1] and indexing.

Were Haiku mor mature/stable would have been a nice fit for the OS for the LLM/Ai personal use cases.

[1] https://arstechnica.com/information-technology/2018/07/the-b...

by ciupicri5 hours ago|

parent|

prev|

[-]

Why Blake3 and not say XXH3 64/128 bits (https://xxhash.com/)?

by heavyset_go5 hours ago|

parent|

prev|

[-]

You can get views by using namespaces/cgroups

by istillwritecode5 hours ago|

prev|

[-]

Except android and iOS are both trying to keep you away from your own files.

by Gigachad33 minutes ago|

parent|

[-]

Kind of? iOS does have a file manager which explicitly shows you your own files. They just made a separation between OS/Program files vs the users own files. What more killed files was cloud programs where multiple users can edit at the same time which required a system that was more sophisticated than syncing a file.

by jnsaff24 hours ago|

prev|

[-]

Here’s me getting excited that a new file system is being developed but alas, just talk about text files.

by galsapir7 hours ago|

prev|

[-]

nice, esp. liked - "our memories, our thoughts, our designs should outlive the software we used to create them"

by SoftTalker3 hours ago|

parent|

[-]

Weird. My memories and thoughts are not created by software.

by fogzen2 hours ago|

prev|

[-]

Does this really have to do with file systems? Replacing RAG/context stuffing with tool calls for data access seems like the actual change. Whether the tool call is backed by a file system or DB or whatever shouldn’t matter, right?

by jonstewart6 hours ago|

prev|

[-]

It reminds me a lot of Hans Reiser’s original white paper, which can be found at https://web.archive.org/web/20070927003401/http://www.namesy.... Add some embeddings and boom.

by naaqq7 hours ago|

prev|

[-]

This article said some things I couldn’t put into words about different AI tools. Thanks for sharing.

by BoredPositron6 hours ago|

prev|

[-]

I revived my Johnny Decimal system as my single source of truth for almost anything and couldn't be happier. The filing is done mostly by agents now but I still have the overview myself.

by ciupicri5 hours ago|

parent|

[-]

Could you give us more details about your system?

by rafaepta6 hours ago|

prev|

[-]

Great read. Thanks for sharing

by bsenftner5 hours ago|

prev|

[-]

I don't think this paradigm will last, or be what becomes the more common structure in the future. This will still suffers from conflicts of persona and objective, plus has the issue that individual apps will need protected file hierarchies to prevent malicious injections. I don't see this as a solution, just a deck chair shuffle.

I've been researching and building with a different paradigm, an inversion of the tool calling concept that creates contextual agents of limited scope, but pipelines of them, with the user in triplicate control of agent as author, operator of an application with a clear goal, and conversationally cooperating on a task with one or more agents.

I create agents that are inside open source software, making that application "intelligent", and the user has control to make the agent an expert in the type of work that human uses that software. Imagine a word processor that when used by a documentation author has multiple documentation agents that co-work with the author. While that same word processor when used by a, for example, romance novelist has similar agents but experts in a different literary / document goal. Then do this with spreadsheets, and project management software, and you get an intelligent office suite with amazing levels of user assistance.

In this structure, context/task specific knowledge is placed inside other software, providing complex processes to the user they can conversationally request and compose on the fly, use and save as a new agent for repeated use, or discard as something built for the moment. The agents are inside other software, with full knowledge of that application in addition to task knowledge related to why the user is using that software. It's a unified agent creation and use and chain-of-thought live editing environment, in context with what one is doing in other software.

I wrap the entire structure into a permission hierarchy that mirrors departments, projects, and project staff, creating an application suite structure more secure than this Filesystems approach, with substantially more user controls that do not expose the potential for malicious application. The agents are each for a specific purpose, which limits their reach and potential for damage. Being purpose built, the users (who are task focused, not developers) easily edit and enhance the agents they use because that is the job/career they already know and continue to do, just with agent help.

by visarga2 hours ago|

parent|

[-]

Your project, while interesting as an approach, is orders of magnitude more complex than the proposition here - which is to rely on agents skills with file systems, bash, python, sed, grep and other cli tools to find and organize data, but also maintain their own skills and memories. LLMs have gained excellent capabilities with files and can generate code on the fly to process them. It's people realizing that you can use a coding agent for any cognitive work, and it's better since you own the file system while easily swapping the model or harness.

I personally use a graph like format but organized like a simple text file, each node prefixed with [id] and inline referencing other nodes by [id], this works well with replace, diff, git and is navigable at larger scales without reading everything. Every time I start work I have the agent read it, and at the end update it. This ensures continuity over weeks and months of work. This is my take on file system as memory - make it a graph of nodes, but keep it simple - a flat text file, don't prescribe structure, just node size. It grows organically as needed, I once got one to 500 nodes.

by bsenftner1 minutes ago|

parent|

[-]

It ends up being similar to how early PC software was written before people realized malicious software could be running. There used to be little to no memory safety between running programs, and this treatment of files as the contextual running memory is similar. It's a great idea until a security perspective is factored in. It will need to end up being very much like closed applications and their of writing proprietary files, which will need some security layer that is not there yet.