upvote
It uses http/2, it has streaming.
reply
They mention in the benchmarks section that the network they're on is a "up to" 15 Gbps connection. So to max out 50GB/s is not realistic.

I agree they should have also listed the compressed size of the table instead of only mentioning the CSV size. But the compressed dataset is probably not smaller than 1/10 of the CSV size. If that's the case they're transferring ~8GB in 4.6 s on a 2GB/s (15Gbps) connection. Seems pretty close to max.

reply
That makes sense. I meant to write 50gbps, I don’t mean they should reach that, I mean you could use any protocol that is fairly efficient and it would reach that.

The size of the dataset should be under 3GB in parquet from what I understand. [0]

So it did 3*8/4.94 = 4.85 Gbps which is underwhelming in terms of network performance.

It is still not possible to make any conclusions since we don’t know how specifically they encode it or how they are running the query.

I just mean this writing is useless in terms of engineering perspective, also what it says about http doesn’t make sense

[0] - https://clickhouse.com/docs/getting-started/example-datasets...

reply
Agreed, that does seem a bit underwhelming. Hopefully there are some performance gains to be made before the production release in september.
reply
They also wanted the protocol to work with duckdb wasm in the browser. I can’t comment on the performance side but that consistency piece is pretty key to duckdbs value proposition I think.
reply
really like duckdb and sorry to pile on, but the parent makes some strong points. I wonder if MotherDuck builds on http as well?
reply
The parent reads more like "it works in practice but does it work in theory?" The innovations that have come out of the DuckDB team seem to always focus on "in practice" instead of focusing on how things are supposed to (or are expected to) be done.
reply
no we don't (source: work at motherduck)
reply