undefined

points

[-]

Filesystems have a property that changes preserve locality. A change made to one branch of the tree doesn't affect other branches (except for links). Databases lack this property: any UPDATE or DELETE can potentially affect any row depending on the condition. This makes them powerful but also scary. I don't want that every time I delete a file it potentially does a rm -rf / if I mistype the query.

The best compromise is what modern OSs have: a tree-like structure to store files but a database index on top for queries.

by JoeAltmaier3 hours ago|

parent|

[-]

You can create the tree structure from a relation. Not a primitive data store operation at all. Just add the attribute: parent directory and voila.

So often we want to look up 'the last file I printed' or 'that message I got from Bob'. Instead of just creating that lookup, we have to go spelunking.

Hell, every major app creates it's own abstractions because the OS/Filesystem doesn't have anything useful. Email systems organize messages and tags; document editors have collections of document aspects they store in a structured blob. Instead of asking the OS to do that.

by p_ing4 hours ago|

prev|

[-]

NTFS has a database, the MFT. It can index attributes, such as file names, which are a b+tree. A file's $DATA is also placed into the MFT, unless it doesn't fit, then NTFS allocates virtual cluster numbers (more MFT attributes) which point to the on-disk data structure of the file.

All files are represented in a table with rows and columns. "Directories" simply have a special "directory = true" attribute in a row (simplified).

The hierarchy is for you, the human.

Like many file systems, NTFS also contains a log for recoverability/rollback purposes.

It's not quite relational but it doesn't make sense to be relational. Why would you need more than one 'table' to contain everything you need to know about a file? Microsoft experimented with WinFS, which wasn't a traditional file system (it was an MSSQL database with BLOB storage which sat ontop of a regular NTFS volume). Performance was bad and Skydrive replaced the need for it (in the view of MSFT).

by dist-epoch3 hours ago|

parent|

[-]

The newest Microsoft filesystem, ReFS, remove the MFT. Because it created a lot of problems.

by p_ing3 hours ago|

parent|

[-]

> Because it created a lot of problems.

Please elaborate.

NTFS is still the better choice for common desktop usage. ReFS goals are centered around data integrity but it comes at the cost of performance.

by packetlost4 hours ago|

prev|

[-]

Files in most file systems are uniquely identified by inode and can be referenced by multiple files. Why does everyone forget links?

by JoeAltmaier2 hours ago|

parent|

[-]

A dataset can persist across multiple file systems. A UUID is a way to know that one dataset is equivalent (identical) to another. Now you can cache, store-and-forward, archive and retrieve and know what you have.

by packetlost2 hours ago|

parent|

[-]

UUIDs aren't very good for this use case, a sufficiently large CRC or cryptographic hash is better because it's intrinsically tied to the data's value while UUIDs are not

by mieubrisse4 hours ago|

prev|

[-]

I've been wondering this too: for us, UUIDs are super opaque. But for an agent, two UUIDs are distinct as day and night. Is the best filesystem just blob storage S3 style with good indexes, and a bit of context on where everything lives?

One thing directories solve: they're great grouping mechanisms. "All the Q3 stuff lives in this directory"

I bet we move towards a world where files are just UUIDs, then directory structures get created on demand, like tags.

by para_parolu4 hours ago|

parent|

[-]

Filepath is just unique name that model can identify easily and understand grouping. Uuid solves nothing but requires another mapping from file to short description.

by JoeAltmaier3 hours ago|

parent|

[-]

UUID solve oh so very, very much.

You can have several versions of the same set of data object at once - an entire source set for a build, all the names duplicate but tagged with 'revision' so they can be distinguished.

Hard to do that without a UUID at root, to use for unique identification of the particular 'particle' of the particular data set.

by JoeAltmaier3 hours ago|

parent|

prev|

[-]

Or, have to "Q" attribute and ask the file store for "Q=3"

All good.