upvote
Are you talking about Andy Pavlo bet here?

https://news.ycombinator.com/item?id=29737326

Kuzu folks took some of these discussions and implemented them. SIP, ASP joins, factorized joins and WCOJ.

Internally it's structured very similar to DuckDB, except for the differences noted above.

DuckDB 1.5 implemented sideways information passing (SIP). And LadybugDB is bringing in support for DuckDB node tables.

So the idea that graph databases have shaky internals stems primarily from pre 2021 incumbents.

4 more years to go to 2030!

reply
I wasn't referring to the Pavlo bet but I would make the same one! Poor algorithm and architecture scalability is a serious bottleneck. I was part of a research program working on the fundamental computer science of high-scale graph databases ~15 years ago. Even back then we could show that the architectures you mention couldn't scale even in theory. Just about everyone has been re-hashing the same basic design for decades.

As I like to point out, for two decades DARPA has offered to pay many millions of dollars to anyone who can demonstrate a graph database that can handle a sparse trillion-edge graph. That data model easily fits on a single machine. No one has been able to claim the money.

Inexplicably, major advances in this area 15-20 years ago under the auspices of government programs never bled into the academic literature even though it materially improved the situation. (This case is the best example I've seen of obviously valuable advanced research that became lost for mundane reasons, which is pretty wild if you think about it.)

reply
> many millions of dollars to anyone who can demonstrate a graph database that can handle a sparse trillion-edge graph.

I wonder why no one has claimed it. It's possible to compress large graphs to 1 byte per edge via Graph reordering techniques. So a trillion scale graph becomes 1TB, which can fit into high end machines.

Obviously it won't handle high write rates and mutations well. But with Apache Arrow based compression, it's certainly possible to handle read-only and read-mostly graphs.

Also the single machine constraint feels artificial. For any columnar database written in the last 5 years, implementing object store support is tablestakes.

reply
Source: https://www.theregister.com/2023/03/08/great_graph_debate_we...

> There are some additional optimizations that are specific to graphs that a relational DBMS needs to incorporate: [...]

This is essentially what Kuzu implemented and DuckDB tried to implement (DuckPGQ), without touching relational storage.

The jury is out on which one is a better approach.

reply
Yes a graph database will happily lead you down a n^3 (or worse!) path when trying to query for a single relation if you are not wise about your indexes, etc.
reply
That sounds like a ”graph” DB which implements edges as separate tables, like building a graph in a standard SQL RDB.

If you wish to avoid that particular caveat, look for a graph DB which materializes edges within vertices/nodes. The obvious caveat there is that the edges are not normalized, which may or may not be an issue for your particulat application.

reply
Are you talking about the query plan for scanning the rel table? Kuzu used a hash index and a join.

Trying to make it optional.

Try

explain match (a)-[b]->(c) return a.rowid, b.rowid, c.rowid;

reply
It certainly does seem problematic to have a graph database hiding edges, sharp or not
reply