upvote
You’d end up implementing your own home grown version of hash join and query pushdown (skipping parquet row groups entirely) etc and your own home grown heuristics in selecting the right one (planning)

Which can outperform a generic solution like this of course, but it’s not less work to make faster for most cases.

Also duckdb can give you access to an in memory representation (e.g. `fetch_arrow_table()`) so you have less “language data structure wrapping” overhead. And you can do filtering yourself on that. In most cases the “where” statements will win though.

reply
The SELECT machinery is the product with databases! SQL often the shortest description of the processing logic, and the database has an efficient local execution engine that can prune/reduce data read based on the plan. Very hard to match in app, especially when joins get involved.
reply