I do a lot of experiments with regexes, and if you get used to the RE2 syntax that DuckDB uses, you can see up to 10-100x uplift in terms of speed compared to Postgres on things like regexp_matches(), regexp_extract(), etc (depending on query/table/machine specifics). It has quite powerful scripting with custom Macros, fixes a lot of annoyances of SQL for me compared to Postgres.
I think if you have access to a machine with a lot of RAM / cores and a beefy data set, then it's basically like a RAMdisk version of Snowflake running locally on your machine.
(and of course the fact that it makes it convenient to read CSV/parquet, read/write from S3, etc) - it's a very ergonomic tool.
It’s not really a database in the traditional sense, there is no ACID complexity, it’s a library that lets use write SQL to query a tabular data file.
There are companies that write cluster computing engines with duckdb as the byte-cruncher at their heart, but usually it's more like NumPy, Pandas or Polars on steroids. Or SQLite, but for running OLAP queries.
The key thing is that this scaled horizontally pretty much forever, since each vehicle had a fixed amount of data per year we could tightly control the performance characteristics of the analysis. Adding more vehicles didn't make things slower, just linearly more expensive.
I vaguely remember the data from those containers also being used to process some aggregate analysis (like the each vehicle-container would output some data that would be consumed by another job that did aggregates). But I don't remember the specifics.
[1]: I believe we used JSONL or parquet format, but I didn't work in that part of the stack directly