It has connectors for Postgres & other stores, so I find it faster to connect to a Postgres instance, pull all of the data from a table (even if the table is like 50GB - if you have 30 cores on the machine it will pull from Postgres using 30 cores in parallel, so it will only take a minute or two) - and then any analytical queries on the data are 10+ times faster in DuckDB over native Postgres (GROUP BY, regexp_replace, count(distinct...) etc).
THis will give you some experience and you'll start to see applicable problem spaces for DuckDB in product areas, especially anything with BI or DW.
There are other embeddable options out there but I found DuckDb fit better for the potentially massive datasets, and also because of how naturally it ingests the types of data we work with, some of its unique features, and how trivial it was to learn and integrate with the project.
Otherwise I use it almost daily for doing guardrailed data exploration with LLMs. I prefer SQL over random DSLs in AWS or Sentry or what have you. I’ll ingest the data I need and just run SQL against it. I mentioned in another comment that I’ll tend to store more useful data (especially data I export routinely, like infra cost reports) on S3 and use a Rill instance to do basic exploration in a GUI (it will query remote parquet files).
* fastapi + duckdb + parquet for the backend for a relatively high profile website
* wasm duckdb + react for a few visualization websites
* yaml driven ETL from lots of sources, principally ugly spreadsheets, into usable data. More T than E or L really
For data I reference frequently, and especially which I know will grow over time, I’ve started using Rill because it makes ad-hoc exploration very smooth and low-friction.
My process tends to be something like:
1. Explore logs or some other at least somewhat structured dataset
2. Use Claude to find useful patterns and determine how I might benefit from this data in ways I wasn’t yet aware
3. See how often it’s useful for decision making
4. If it’s frequently useful, formalize it as a view in my Rill instance and refine the models to maximize their utility