In practice, most orgs with sufficiently large and complex data models use the term "UUID" to mean a pure 128-bit value that makes no reference to the UUID standard. It is not difficult to find yourself with a set of application requirements that cannot be satisfied with a standardized UUID.
The sophistication of our use case scenarios for UUIDs exceeds their original design assumptions. They don't readily support every operation you might want to do on a UUID.
Using MD5 or 122 bits of a SHA1 hash seems questionable now that both algorithms have known collisions. Using 122 bits of a SHA2/3 seems pretty limited too. Maybe if you've got trusted inputs?
Let's say i have some entity like an "organization" that has data that spans several different tables. I want to use that organization as a "parent" in such a way where i can clone them to create new "child" organizations structured the same way they are. I also want to periodically be able to pull changes from the parent organization down into the child organization.
If the primary keys for all tables involved are UUIDs, I can accomplish this very easily by mapping all IDs in the relevant tables `id => uuid5(id, childOrgId)`. This can be done to all join tables, foreign keys, etc. The end result is a perfect "child" clone of the organization with all data relations still in place. This data can be refreshed from the parent organization any time simply by repeating the process.
v7 rough ordering also helps as a PK in certain sharded DBs, while others want random, or nonsharded ones usually just serial int.
For those ~~curious~~ worried, no, this was not a security sensitive context.
It's the same reason we use UTF-8. It's well supported. UUIDs are well supported by most languages and storage systems. You don't have to worry about endianness or serialization. It's not a thing you have to think about. It's already been solved and optimized.
Now generate your random ID. Did you use a CSPRNG, or were your devs lazy and just used a PRNG? Are you doing that every time you're generating one of these IDs in any system that might need to communicate with your API? Or maybe they just generated one random number, and now they're adding 1 every time.
Now transfer it over a wire. Are you sure the way you're serializing it is how the remote system will deserialize it? Maybe you should use a string representation, since character transmission is a solved problem with UTF-8. OK, so who decides what that canonical representation is? How do we make it recognizable as an ID without looking like something that people should do arithmetic with?
It's not like random IDs were a new idea in 2002.
But, using UUIDv4 shouldn't be rocket science, either. UUID support should be built in to a language intended for web applications, database applications, or business applications. That's why you're using Go or C# instead of C. And Go is somewhat focused on micro-service architectures. It's going to need to serialize and deserialize objects regularly.
> Are you sure the way you're serializing it is how the remote system will deserialize it?
It's 16 bytes. There's no serialization.
Hex encoding with hyphens in the right spot isn't serialization?
A downvote tells me nothing. Please tell me what I'm missing, maybe I could learn something
Ah, here we are. If it's just bytes, why store it as a string? Sixteen bytes is just a 128-bit integer, don't waste the space. So now the DB needs to know how to convert your string back to an integer. And back to a string when you ask for it.
"Well why not just keep it as an integer?"
Sure, in which base? With leading zeroes as padding?
But now you also need to handle this in JavaScript, where you have to know to deserialize it to a Bigint or Buffer (or Uint8Array).
UUIDs just mean you don't need to do any of this crap yourself. It's already there and it already works. Everything everywhere speaks the same UUIDs.
(Downvote wasn't me)
Also mentioned on HN https://news.ycombinator.com/item?id=45323008
1. Users - your users table may not benefit by being ordered by created_at ( or uuid7 ) index because whether or not you need to query that data is tied to the users activity rather than when they first on-boarded.
2 Orders - The majority of your queries on recent orders or historical reporting type query which should benefit for a created_at ( or uuidv7 ) index.
Obviously the argument is then you're leaking data in the key, but my personal take is this is over stated. You might not want to tell people how old a User is, but you're pretty much always going to tell them how old an Order is.
There's also a hot spot problem with databases. That's the performance problem with autoincrement integers. If you are always writing to the same page on disk, then every write has to lock the same page.
Uuidv7 is a trade off between a messy b-tree (page splits) and a write page hot spot (latch contention). It's always on the right side of the b-tree, but it's spread out more to avoid hot spots.
That still doesn't mean you should always use v7. It does reversibly encode a timestamp, and it could be used to determine the rate that ids are generated (analogous to the German tank problem). If the uuidv7 is monotonic, then it's worse for this issue.
> To be clear, UUIDv8 is not a replacement for UUIDv4 (Section 5.4) where all 122 extra bits are filled with random data.
> UUIDv8's uniqueness will be implementation specific and MUST NOT be assumed.
Here's a spec compliant UUIDv8 implementation I made that doesn't produce unique IDs: https://github.com/robalexdev/uuidv8-xkcd-221
So, given a spec-compliant UUIDv4 you can assume it is unique, but you'd need out-of-band information to make the same assumption about a UUIDv8.
I wrote much more in a blog post: https://alexsci.com/blog/uuid-oops/