undefined

points

by jawns23 hours ago |

comments

by phamilton22 hours ago|

[-]

My favorite lens on SQLite is that it is actually two things:

1. A robust durability implementation 2. A library of high performance data structure and algorithms

The fact this it's SQL is nice, but those two attributes are what make it great.

For example, I'm implement an in-process event log that I want to be durable. I started simple, but soon saw some edge cases and instead of playing whackamole I just swapped to using sqlite as an ordered kv store that gives me ACID.

Another example: ingesting multiple inter related datasets. Instead of a dozen hash maps in memory, I load them up into sqlite (no persistence) and then slice and dice as I need to.

It's a super useful tool.

by rsalus17 hours ago|

parent|

[-]

mirrors my own experience creating a persistent event log. I started with JSON, then JSONL, etc until finally landing on SQLite.

by chaps22 hours ago|

prev|

[-]

The moment my JSON has any sort of depth and I need to write a parser for it and potentially account for unspecified behavior. JSON's nice when it's nice, but it's terrible when it's terrible. It's 100x easier to write SQL than writing jq and... dear god if I have to use grep -A or -B, I'm doing something wrong. Constraints are actually a good thing!

The underlying database isn't the most important thing. Just use SQL. Its namespacing (eg, through CTEs) is good and you're more likely to have colleagues who know SQL compared to jq.

by sofixa10 hours ago|

parent|

[-]

> It's 100x easier to write SQL than writing jq and... dear god if I have to use grep -A or -B, I'm doing something wrong. Constraints are actually a good thing!

As an occasional consumer of JSON/CSV, that's why I really like DuckDB, it's just SQL for such file formats. And it manages to be super fast at it too.

by gopalv20 hours ago|

prev|

[-]

> an example of a case where you'd use SQLite instead of jq or grep through Markdown?

Usually we end up writing a script to incrementally refresh a data-set I'm analyzing (or have someone send me a copy after they pull it).

I've been using sqlite for anything which needs an UPDATE - modifying a row deep inside the data-set with jsonl is a pain.

My github is full of java programs which update sqlite3 files with threadpools and a single big lock around the UPDATE (& then I write or have an agent write code to analyze it).

DuckDB is slowly replacing it in the context of python, simply because of the ease of pushing a UDF into the SQL.

Also because I really like expressing things as LEAD/LAG with a UDF on top.

by dogline19 hours ago|

parent|

[-]

UDF: User Defined Function

by pokstad19 hours ago|

prev|

[-]

SQLite is more efficient for large data sets. A single markdown or JSON file needs to be streamed to locate a piece of data O(n). Updating an existing entry in a sequential file is even worse because you have to rewrite the file. SQLite has the data structures to quickly find data in O(log n) time.

by fragmede22 hours ago|

prev|

[-]

Honest answer is: whenever your markdown or json files get to be big enough that grep/jq takes long enough that you get bored waiting for it.

by embedding-shape20 hours ago|

parent|

[-]

> get to be big enough that grep/jq takes long enough

On a modern processor, that's about GBs of data typically, right?

by bitexploder19 hours ago|

parent|

[-]

Practically yes, but much earlier if agents are touching that data in my experience. Tens of GB even if you design well.