In modern C++, the technically "correct" and safe way to spell this trick is exactly as you suggested: using uintptr_t (or intptr_t).
(is the current version of that paper, the tracking ticket insisted there's a P3125R5 and that LEWG had seen it in 2025, but it isn't listed in a mailing so it might be a mirage)
You know it's a Hana paper because it wants this to be allowed at compile time (C++ constrexpr) but joking aside this seems like a nice approach for C++ which stays agnostic about future implementation details.
Maybe that is not the correct C++ terminology, I'm more familiar with how provenance works in Rust, where large parts of it got stabilised a little over a year ago. (What was stabilised was "strict provenance", which is a set of rules that if you abide them will definitely be correct, but it is possible the rules might be loosened in the future to be more lenient.)
The problem of pointer provenance is more finding a workable theoretical model rather than one causing miscompiles on realistic code. While there are definitely miscompiles on carefully constructed examples, I'm not aware of any bugs on actual code. This is in comparison to topics like restrict(/noalias) semantics or lifetime semantics, where there is a steady drip of bug reports that turn out to be actual optimization failures.
But the likely destiny of C++ is to inherit the provenance rules that are an adjunct to C23, PNVI-ae-udi, Provenance Not Via Integers, Addresses Exposed, User Disambiguates
As that name suggests, in this model provenance is not transmitted via integers. Every 123456 is always just the integer 123456 and there aren't magic 123456 values which are different and transmit some form of provenance from a pointer to some value which happened perhaps to be stored at address 123456 in memory.
However, PNVI-ae-udi has Exposure, which means if we exposed the pointer in an approved way then the associated provenance is somehow magically "out there" in the ether, as a result if we have exposed this pointer then just having that integer 123456 works fine because we combined that integer 123456 with that provenance from the ether and make a working pointer. User disambiguation means that the compiler has to give you "benefit of the doubt" e.g. if you could mean to make a pointer to that Doodad which no longer exists as of a minute ago or to this other Doodad which does exist, well, benefit of the doubt means it was the latter and so your pointer is valid even though the addresses of both Doodads were the same.
There's a competing proposal in C++ land to add provenance via angelic nondeterminism: if there's some provenance that makes the code non-UB, then use that provenance. (As you might imagine, I'm not a big fan of that proposal, but WG21 seems to love it a lot more than I do.)
Angelic non-determinism seems difficult to use to determine if an optimisation is valid. If I understand this correctly, it is basically the as-if rule, but in this case applied to something that potentially needs global program analysis. Would that be an accurate understanding?
It sounds like both of these proposals will be strictly less able to optimize than strict provenance in rust to me. In particular, Rust allows applying a closure/lambda to map a pointer while keeping the provenance. That avoids exposing the provenance as you add and remove tag bits, which should at least in theory allow LLVM to optimise better. (But this keeps the value as a pointer, and having a dangling pointer that you don't access is fine in Rust, probably not in C?)
I'm not sure why I'm surprised actually, Rust can be a more sensible language in places thanks to hindsight. We see this in being able to use LLVM noalias (restrict basically) in more places thanks to the different aliasing model, while still not having the error prone TBAA of C and C++. And it doesn't need a model of memory consisting of typed objects (rather it is all just bytes and gets materialised into a value of a type on access).
https://doc.rust-lang.org/std/primitive.pointer.html#method....
Rust's MIRI is able to run code which uses this (a strict provenance API) because although MIRI's pointers are some mysterious internal type, it can track that we mapped them to hide our tags, and then later mapped back from the tagged pointer to recover our "real" pointer and see that's fine.
This isn't an unsafe operation. Dereferencing a pointer is unsafe, but twiddling the bits is fine, it just means whoever writes the unsafe dereferencing part of your codebase needs to be very careful about these pointers e.g. making sure the ones you've smuggled a tag in aren't dereferenced 'cos that's Undefined Behaviour.
It's clear to me how this works in Rust, it's just unclear still in C++
Is doing that manually worth it? Usually not, but for some core types (classical example is strings) or in language runtimes it can be.
Would it be awesome if this could be done automatically? Absolutely, but I understand it is a large change, and the plan is to later build upon the pattern types that are currently work in progress (and would allow you to specify custom ranged integer typed).
So that's one tiny use of this sort of idea which is guaranteed unnecessary in Rust, and indeed although it isn't guaranteed the optimiser will typically spot less obvious opportunities so that Option<Option<bool>> which might be None, or Some(None) or Some(Some(true)) or Some(Some(false)) is the same size (one byte) as bool.
But hiding stuff in a pointer is applicable in places your Rust compiler won't try to take advantage unless you do something like this. A novel tiny String-like type I saw recently does this, https://crates.io/crates/cold-string ColdString is 8 bytes, if your text is 8 or fewer bytes of UTF-8 then you're done, that'll fit, but, if you have more text ColdString allocates on the heap to store not only your text but also its length and so it needs to actually "be" in some sense a raw pointer to that structure, but if the string is shorter that pointer is nonsense, we've hidden our text in the pointer itself.
Implementation requires knowing how pointers work, and how UTF-8 encoding works. I actually really like one of the other Rust tiny strings, CompactString but if you have a lot of very small strings (e.g. UK postcodes fit great) then ColdString might be as much as three times smaller than your existing Rust or C++ approach and it's really hard to beat that for such use cases.
Edited: To remove suggestion ColdString has a distinct storage capacity, this isn't intended as a conventional string buffer, it can't grow after creation
https://blog.rust-lang.org/2025/01/09/Rust-1.84.0/#strict-pr...
If you don't care about portability or using every theoretically available bit then it is trivial. A maximalist implementation must be architecture aware and isn't entirely knowable at compile-time. This makes standardization more complicated since the lowest common denominator is unnecessarily limited.
In C++ this really should be implemented through a tagged pointer wrapper class that abstracts the architectural assumptions and limitations.