upvote
https://www.thenile.dev/blog/uuidv7#why-uuidv7 has some details: " UUID versions that are not time ordered, such as UUIDv4 (described in Section 5.4), have poor database-index locality. This means that new values created in succession are not close to each other in the index; thus, they require inserts to be performed at random locations. The resulting negative performance effects on the common structures used for this (B-tree and its variants) can be dramatic. ".

Also mentioned on HN https://news.ycombinator.com/item?id=45323008

reply
In more practical terms:-

1. Users - your users table may not benefit by being ordered by created_at ( or uuid7 ) index because whether or not you need to query that data is tied to the users activity rather than when they first on-boarded.

2 Orders - The majority of your queries on recent orders or historical reporting type query which should benefit for a created_at ( or uuidv7 ) index.

Obviously the argument is then you're leaking data in the key, but my personal take is this is over stated. You might not want to tell people how old a User is, but you're pretty much always going to tell them how old an Order is.

reply
It's memory and disk paging both.

There's also a hot spot problem with databases. That's the performance problem with autoincrement integers. If you are always writing to the same page on disk, then every write has to lock the same page.

Uuidv7 is a trade off between a messy b-tree (page splits) and a write page hot spot (latch contention). It's always on the right side of the b-tree, but it's spread out more to avoid hot spots.

That still doesn't mean you should always use v7. It does reversibly encode a timestamp, and it could be used to determine the rate that ids are generated (analogous to the German tank problem). If the uuidv7 is monotonic, then it's worse for this issue.

reply
v7 exposes creation date, and maybe you don't want that. So, depends on use-case
reply
I think I read something once about using v7 internally and exposing v4 in your API.
reply
Or even an autoincrement int primary key internally. Depending on your scale and env etc, but still fits enough use cases.
reply
In distributed databases I've worked with, there's usually something like a B-tree per key range, but there can be thousands of key ranges distributed over all the nodes in the cluster in parallel, each handling modifications in a LSM. The goal there is to distribute the storage and processing over all nodes equally, and that's why predictable/clustered IDs fail to do so well. That's different to the Postgres/MySQL scenario where you have one large B-tree per index.
reply