The thing is, most real-world numbers will fit within 1-3 bytes (even at 7 bits per byte), so ultradense packing doesn't actually buy much outside of benchmarks.
I spent WAYYYYYYYY too much time exploring this...
Stored and transmitted data also has to be decoded. With modern datacenter hardware bottleneck is often CPU rather than network or disk (SSD). It depends on specific properties of the data. (I used to work on search index implementation which is about decoding and intersecting large amounts of hit-lists; and right SIMD-friendly varint encoding is obviously crucial)