SQLite Is a Library of Congress Recommended Storage Format

upvote

SQLite Is a Library of Congress Recommended Storage Format

(sqlite.org)

291 points

by whatisabcdefgh13 hours ago |

upvote

by tnelsond45 hours ago|

[-]

I'm always inspired by SQLite. Overall I like it, but if you're not doing writes it's really overkill.

So I made a format that will never surpass SQLite, except that it's extremely lighter and faster and works on zstd compressed files. It has really small indexes and can contain binaries or text just like SQLite.

The wasm part that decompresses and reads and searches the databases is only 38kb (uncompressed (maybe 16kb gzipped)). Compare that to SQLite's 1.2mb of wasm and glue code it's 3% the size but searching and loading is much faster. My program isn't really column based and isn't suitable for managing spreadsheets, but it's great for dictionaries and file archives of images and audio.

I ported the jbig2 decoder as a 17kb wasm module, so I can load monochrome scans that are 8kb per page and still legible.

https://github.com/tnelsond/peakslab

SQLite is very well engineered, PeakSlab is very simple.

reply

upvote

by sgbeal2 minutes ago|

[-]

> Compare that to SQLite's 1.2mb of wasm and glue code

The current trunk is actually 1.7mb in its canonical unminified form (which includes more docs than code), split almost evenly between the WASM and JS pieces :/.

Disclosure: i'm its maintainer.

reply

upvote

by pjc501 hours ago|

[-]

I think actually this competes with the old BerkeleyDB: https://en.wikipedia.org/wiki/Berkeley_DB - which I now see is no longer BSD-licensed, and in any case has been rendered almost extinct by SQLite. It was used for basic on-disk key-value store work.

reply

upvote

by tnelsond41 hours ago|

[-]

Even BerkeleyDB tries to be mutable. What I'm doing doesn't need the mutability so it's much more similar to dictionary formats (though probably simpler) than it is to a database. Though a lot of people do use full databases for immutable dictionary key-value stuff. I just couldn't get any database to work well enough for a pwa dictionary.

reply

upvote

by gpvos24 minutes ago|

[-]

A more standard solution would be cdb.[0] Although that doesn't support compressed data.

[0] https://cdb.cr.yp.to/ , https://en.wikipedia.org/wiki/Cdb_(software)

reply

upvote

by meindnoch42 minutes ago|

[-]

It is crashing Safari.

reply

upvote

by giza1823 hours ago|

[-]

Perhaps a dumb question, but how do you get data into it if you’re not doing writes

reply

upvote

by pfortuny57 minutes ago|

[-]

Think historical records of, say, share values for past years. You might have a single db for 1900-2000, for instance. Things like that.

Not everything needs to be real-time updated.

reply

upvote

by andrelaszlo2 hours ago|

[-]

I think it's just immutable once you've generated it. No need to update indexes or check consistency on writes, no need for transactions, etc.

reply

upvote

by tnelsond43 hours ago|

[-]

Generate it one time from a source tsv file or folder of media.

reply

upvote

by zoky4 hours ago|

[-]

something something XKCD competing standards something something

reply

upvote

by lpln34521 hours ago|

[-]

Creating something new for a different use case isn't pointless. It's like comparing inline skates to ice skates.

reply

upvote

by tnelsond44 hours ago|

[-]

Believe me, I tried sticking to SQLite or aard2 or stardict, they just were fundamentally inadequate with no good pwa cross platform tooling.

reply

upvote

by bbkane3 hours ago|

[-]

Does this remain true now that SQLite has a WASM build?

reply

upvote

by tnelsond43 hours ago|

[-]

Yes, because originally when I started PeakSlab it used the SQLite wasm build.

reply

upvote

by keybored32 minutes ago|

[-]

Doesn’t even apply unless someone says that (1) there are too many “standards”, and (2) so we are making this standard (neither apply here). Someone made something.

We should really consider eventually retiring memes because they just end up as thought-terminating cliches.

This is of course referring to xkcd #927. How do I know that?

reply

upvote

by alexpotato9 hours ago|

[-]

I have always loved SQLite.

I have also heard that some firms ban its use.

Why?

Because it makes it SO easy to set up a database for your app that you end up with a super critical component of your application that looks exactly like a file. A file that can have any extension. And that file can be copied around to other servers. Even if there is PII in that file. Multiply this times the number of applications in your firm and you can see how this could get a little nuts.

DevOps and DBA teams would prefer that the database be a big, heavy iron thing that is very obviously a database server. And when you connect to it, that's also very obvious etc etc.

I still love SQLite though.

reply

upvote

by Fwirt9 hours ago|

[-]

The question is, do the same firms ban Excel? Excel spreadsheets often end up as shadow databases in unlikely places.

reply

upvote

by mr_toad3 minutes ago|

[-]

I’ve worked at some organisations that have strict rules (not always strictly followed) about what can go in Excel spreadsheets, and where they have to be stored. The C drive is verboten. Some also have standards about classification and labelling of PII and sensitive data.

reply

upvote

by croon1 hours ago|

[-]

This might catch flak, but generalizing I would assume that the people banning things are the same people who would use excel for something where a database would be better, and if so, that is the reason Excel isn't banned on the same conditionals that would get sqlite banned.

reply

upvote

by hermitShell8 hours ago|

[-]

The sane thing would be to ban Excel and promote SQLite. Excel is often used for tabulated text (issue tracking) not calculations. Perfect use case for a relational db

reply

upvote

by frollogaston8 hours ago|

[-]

Excel is made for calculations. But if you make it hard to make a DB, people will abuse Excel as a DB.

reply

upvote

by TJSomething5 hours ago|

[-]

I mean, it might have been at first, but Microsoft figured out that the majority of users for lists without formulas in 1993 and they've strategized around that. IMHO, the biggest concession to this was when they added Power Query to core Excel in 2016.

reply

upvote

by rswail3 hours ago|

[-]

Excel has sheets for tables, columns and rows, primary keys (UNIQUE), foreign key references etc if you squint.

It doesn't require you use all of that properly, but it's there.

reply

upvote

by harvie2 hours ago|

[-]

or reimplement excel with sqlite as a backend :-D

BTW sqlite can run SQL queries on CSV files with relatively simple one-liner command...

reply

upvote

by 0123456789ABCDE3 hours ago|

[-]

and excel has gui for forms

reply

upvote

by rantingdemon2 hours ago|

[-]

Only where VBA is available. Not available for MacOs versions if I'm correct?

reply

upvote

by silon425 hours ago|

[-]

IMO, almost any Excel more than a month old should become readonly.

reply

upvote

by irishcoffee39 minutes ago|

[-]

You should consider knock-on effects of this brilliant idea. Now there would be copies of spreadsheets younger than a month that get replicated 47 billion times, exponentially compounding the problem you're trying to solve.

This sounds like how we pass so many stupid laws. Nobody thinks about 2nd order effects.

reply

upvote

by Spooky238 hours ago|

[-]

They generally cannot. But they do banish Access.

reply

upvote

by pasc18784 hours ago|

[-]

Now that is different.

Access gets used for a shared DB and that is quite easy to corrupt. It is much more cost effective to have that in a proper central database (I supse SQLLite is better here as well)

reply

upvote

by cwillu1 hours ago|

[-]

Excel is also a shared DB: it has supported multiple concurrent users accessing and modifying the same spreadsheet for decades.

reply

upvote

by 6 hours ago|

[-]

deleted

reply

upvote

by DeathArrow6 hours ago|

[-]

Do companies ban text files? Text files are used to store data.

reply

upvote

by yard20103 hours ago|

[-]

Do companies ban data centers? It's crazy to send PII to other computers on the line.

reply

upvote

by altmanaltman3 hours ago|

[-]

Do companies ban brains? Brains are used to store data.

reply

upvote

by tehlike7 hours ago|

[-]

There are interesting uses for sqlite, like this one: https://sqlite.org/sqlar.html

reply

upvote

by gandutraveler48 minutes ago|

[-]

DevOPs and DBAs must hate RAM and caches. We

reply

upvote

by ai_slop_hater9 hours ago|

[-]

That's so dumb

reply

upvote

by slopinthebag8 hours ago|

[-]

> DevOps and DBA teams

Ah so two teams nobody should listen to.

reply

upvote

by frollogaston8 hours ago|

[-]

At least would take it with a grain of salt when the DBA wants you to depend more on the DBA.

reply

upvote

by slopinthebag8 hours ago|

[-]

Same with devops tbh.

"Hey everyone, we need to chose the option that involves us the most and provides us the most job security"

reply

upvote

by mschuster913 hours ago|

[-]

Well... eventually the company learns the lesson the hard way, either because a site goes down or gets 0wned. Then everyone will cry about "how this could happen", and the ops people will tell you in response "we warned you that this would happen, here's the receipts, now GTFO".

reply

upvote

by faangguyindia6 hours ago|

[-]

I went from thinking “SQLite is a toy product, not reliable for real data" to "lets use SQLite for almost everything"

SQLite is very good if you can fit into the single writer, multiple readers pattern; you'll never lose data if you use the correct settings, which takes a minute of Google search to figure out.

Today, most of my apps are simply go binary + SQLite + systemd service file.

I've yet to lose data. Performance is great and plenty for most apps

reply

upvote

by michaelchisari4 hours ago|

[-]

The single writer is less of an issue in practice than it's made out to be. Modern nvme drives are incredible and it's trivial to get 5k writes per second in an optimized WAL setup. Way more than most apps could ever dream.

And even then, I've used a batch writer pattern to get 180k writes per second on a commodity vps.

reply

upvote

by 0123456789ABCDE3 hours ago|

[-]

all* of that + sharding -> https://sqlite.org/lang_attach.html

ex: main.db + fts.db. reading and writing to main.db is always available; updating the fts index can be done without blocking the main database — it only needs to read, the reads can be chunked, and delayed. fts.db keeps the index + a cursor table — an id or last change ts

could also use a shard to handle tables for metrics, or simply move old data out of main.db

* some examples:

  conn = sqlite3.connect("data.db")
  conn.execute("PRAGMA journal_mode=WAL")        # concurrent reads (see above)
  conn.execute("PRAGMA synchronous=NORMAL")      # fsync at checkpoint, not every commit
  conn.execute("PRAGMA cache_size=-62500")       # ~61 MB page cache (negative = KB)
  conn.execute("PRAGMA temp_store=MEMORY")       # temp tables and indexes in RAM
  conn.execute("PRAGMA busy_timeout=5000")       # wait 5s on lock instead of failing

edit: orms will obliterate your performance — use raw queries instead. just make sure to run static analysis on your code base to catch sqli bugs.

my replies are being ratelimited, so let me add this

the heavy duty server other databases have is doing that load bearing work that folks tend to complain about sqlite can't do

the real dmbs's are doing mostly the same work that sqlite does, you just don't have to think about it once they're set up. behind that chunky server process the database is still dealing with writing your data to a filesystem, handling transaction locks, etc.

by default sqlite gives you a stable database file, that when you see the transaction complete, it means the changes have been committed to storage, and cannot be lost if the machine were to crash exactly after that.

you can decide to wave some, or all of those guaranties in exchange for performance, and this doesn't even have to be an all or nothing situation.

reply

upvote

by hparadiz2 hours ago|

[-]

Oh fun something I have some metrics on. I just made this benchmark for every php orm a few weeks ago for fun.

https://the-php-bench.technex.us/

There's a huge performance difference between memory and file storage within sqlite itself. Not even getting into tuning specifics.

reply

upvote

by 2 hours ago|

[-]

deleted

reply

upvote

by Ringz3 hours ago|

[-]

I usually try to explain it like this: “Single writer” is rarely a real problem, because a writer is not slow. It writes exclusively, but very quickly.

"Batch writer pattern" is a good idea to get rid of expensive commits.

reply

upvote

by rmunn7 hours ago|

[-]

> As of this writing (2018-05-29) ...

So this news is nearly <del>six</del> EIGHT years old. But I didn't happen to know about it until now, so that's not a complaint at all; rather, this is a thank-you for posting it.

(Thanks for the correction. Brief brain malfunction in the math department there).

reply

upvote

by tehlike7 hours ago|

[-]

Sir, it's 2026. It's 8 years old.

reply

upvote

by harrouet2 hours ago|

[-]

Not if the GP was written 2 years ago :)

reply

upvote

by rmunn7 hours ago|

[-]

Corrected; thanks.

reply

upvote

by frollogaston6 hours ago|

[-]

Was going to say, was having deja vu reading this

reply

upvote

by llagerlof20 minutes ago|

[-]

I used SQLite for a few applications several years ago. One time, the database got corrupted and all the data was lost. That was the day I stopped using SQLite.

Also, the lack of enforced column data types was always a negative for me.

reply

upvote

by jjice19 minutes ago|

[-]

No matter the medium, backups are a must.

reply

upvote

by srcreigh9 hours ago|

[-]

2026 recommended storage formats: https://www.loc.gov/preservation/resources/rfs/data.html

reply

upvote

by akihitot8 hours ago|

[-]

For public-sector data preservation, it may be one of the best options.

The specification is publicly available

- It is widely adopted - It is likely to remain readable in the future - It has little dependency on specific operating systems or services - It carries low patent risk

From the perspective of long-term continuity, avoiding dependence on any particular company or service is extremely important.

reply

upvote

by Spooky238 hours ago|

[-]

Archivists also love formats close to native. SQLite lets the relational relationships be present in a way that csv cannot.

reply

upvote

by akihitot7 hours ago|

[-]

That's certainly true. The ability to define table relationships is a major difference from CSV.

reply

upvote

by afshinmeh5 hours ago|

[-]

I love SQLite and thanks for sharing it but there should be a "(2018)" at the end in the title:

> As of this writing (2018-05-29) the only other recommended storage formats for datasets are XML, JSON, and CSV.

reply

upvote

by maxloh5 hours ago|

[-]

FYI, they added a lot more formats to the list after that.

  Preferred
  
  1. Platform-independent, character-based formats are preferred over native or binary formats as long as data is complete, and retains full detail and precision. Preferred formats include well-developed, widely adopted, de facto marketplace standards, e.g.
    a. Formats using well known schemas with public validation tool available
    b. Line-oriented, e.g. TSV, CSV, fixed-width
    c. Platform-independent open formats, e.g. .db, .db3, .sqlite, .sqlite3
  
  2. Any proprietary format that is a de facto standard for a profession or supported by multiple tools (e.g. Excel .xls or .xlsx, Shapefile)
  
  3. Character Encoding, in descending order of preference:
    a. UTF-8, UTF-16 (with BOM),
    b. US-ASCII or ISO 8859-1
    c. Other named encoding
  
  ---
  
  Acceptable
  
  For data (in order of preference):
  
  1. Non-proprietary, publicly documented formats endorsed as standards by a professional community or government agency, e.g. CDF, HDF
  2. Text-based data formats with available schema
  
  For aggregation or transfer:
  
  1. ZIP, RAR, tar, 7z with no encryption, password or other protection mechanisms.

https://www.loc.gov/preservation/resources/rfs/data.html

reply

upvote

by xxs4 hours ago|

[-]

.7z being there just discredits the entire process. The underlying compression algorithm is a free-hand one and can be anything[0], or contain bugs and exploits[1]. Personally I use only zstd with .7z which is 'non-standard' by the official (Russian) release.

[0]: https://7-zip.org/7z.html

[1]: CVE-2025-0411

reply

upvote

by tnelsond42 hours ago|

[-]

I love using zstd, it's so fast to decompress. I especially like that the JavaScript decoder is 8kb and still really fast. Though the 25kb wasm decoders are about twice as fast.

What are the advantages or reasons to use zstd in a 7z container versus just .zst?

reply

upvote

by tombert6 hours ago|

[-]

On a recent project I have needed to use exFAT. exFAT is terrible for a number of reasons, but in my case the thing I had to deal with was the lack of journaling, which had the possibility to corrupt files if there were a power interruption or something.

I initially was writing a series of files and doing some quasi-append-only things with new files and compacting the old one to sort of reinvent journaling. What I did more or less worked but it was very ad hoc and bad and was probably hiding a lot of bugs I would eventually have to fix later.

And then I remembered SQLite. I realized that ACID was probably safe enough for my needs, and then all the hard parts I was reinventing were probably faster and less likely to break if I used something thoroughly audited and tested, so I reworked everything I was doing to SQLite and it worked fine.

I wish exFAT would die in a fire and a journaling filesystem would replace it as the "one filesystem you can use everywhere", but until it does I'm grateful SQLite exists.

reply

upvote

by topham6 hours ago|

[-]

The problem with it is you didn't solve your biggest actual problem, you just haven't had a problem bite you in the ass yet so you think your problem is solved.

reply

upvote

by tombert4 hours ago|

[-]

I am not sure the problem is actually fully solvable. I think SQLite helps at least a little.

reply

upvote

by mmooss6 hours ago|

[-]

> I wish exFAT would die in a fire and a journaling filesystem would replace it as the "one filesystem you can use everywhere"

Where exactly is everywhere? Win32? All of Linux? BSDs? MacOS? IOS? ...

reply

upvote

by tombert3 hours ago|

[-]

Everywhere exFAT is supported now. Windows, Mac, Linux, FreeBSD would be fine.

reply

upvote

by pbhjpbhj2 hours ago|

[-]

Presumably Microsoft fear making it easy to swap OSes and access the same data.

"I can use Linux because if I get stuck I can just switch to Windows and still access my data" is a comfort that probably keeps people from even trying Linux (or other OSes)?

Why else would MS not support BTRFS/ZFS/Ext or whatever?

{I'm not saying that I think this works.}

reply

upvote

by iknowstuff2 hours ago|

[-]

> Why else would MS not support BTRFS/ZFS/Ext or whatever?

You seriously can’t think of another reason? File systems are complex. Maintenance is a huge burden. Getting them wrong is a liability. Reason enough to only support the bare minimum. And then, 99% of their users don’t care about any of those. NTFS is good enough

reply

upvote

by ghrl5 hours ago|

[-]

Something MacOS and Windows support natively would be a good start, it could grow from there.

reply

upvote

by Ringz2 hours ago|

[-]

Looking at *all* my external drives now... that would be great.

reply

upvote

by testermelon4 hours ago|

[-]

I'm surprised they included proprietary format that's de facto standard in profession or supported by multiple tools (.xls, .xlsx) in preferred section [1]. I wonder if "well-known enough" is as good as "open" from preservation standpoint.

[1] https://www.loc.gov/preservation/resources/rfs/data.html

reply

upvote

by mort963 hours ago|

[-]

Especially when Office 365 shows that not even Microsoft is capable of making software which can display Office files anymore... if you have a Word file which was created or has ever been modified by the Word application, working with it through Office 365 in a browser is such a pain. I've literally had images which are impossible to delete or move in the web version, and they will absolutely render in the wrong place.

reply

upvote

by pletnes3 hours ago|

[-]

You can unzip the xlsx and read the xml inside. It’s not the worst format by far.

reply

upvote

by ray_v7 hours ago|

[-]

It's so funny, because I was JUST telling a colleague of mine - another librarian - this exact fact about sqlite!

reply

upvote

by guelo3 hours ago|

[-]

I get annoyed at all the other DBs that require their own heavy duty server process when for 90% of my projects there is only one client, my app server. Is there a DB that combines sqlite's embedded simplicity with higher concurrent write throughput?

reply

upvote

by mercaearth1 hours ago|

[-]

[dead]

reply

upvote

by WindyBolt9077 hours ago|

[-]

[dead]

reply

upvote

by arian_5 hours ago|

[-]

[flagged]

reply

upvote

by openclawclub1 hours ago|

[-]

[dead]

reply

upvote

by ksamantha6 hours ago|

[-]

[flagged]

reply

upvote

by cpach3 hours ago|

[-]

Welcome to Hacker News! Please write in English here. Thank you in advance from a long-time member :)

reply

upvote

by latexr2 hours ago|

[-]

Translating the comments and looking at the bio, wouldn’t be surprised if this is a bot.

reply