If you haven't come across Kvrocks yet, it may be worth a look: https://github.com/apache/kvrocks https://kvrocks.apache.org/ . It's a database with a Redis-compatible wire protocol, but the database is stored on disk. This means your working set is not limited by RAM and can be a few orders of magnitude larger! On modern SSDs this is still very fast. I think it improves the durability story as well. But the big win is the orders of magnitude larger database space.
As I've been improving my side project https://totalrealreturns.com/ recently I've ended up using both Redis and Kvrocks together. Redis is great for small global state that needs to be super fast. Kvrocks is great for larger bulk data storage (large precomputed datasets), but also supports a lot of the Redis data structures as well as Lua scripts.
For example if you use it for session storage, you can't have your application read from a random instance that may or may not contain the session.
Now you could solve this specific case by sharding by prefix, or by querying all instances, but then you still do not have high availability: if the instance a specific session is on is down, these users cannot authenticate. At that point you’re better off with a single instance.
You, obviously, don't commit important data only to a session that you can loose, if the application does not allow it.
We use redis as infrastructure. To route events and as a cache.
For us redis could go down and we would merely see a degradation of our service with no data loss.
I recommend using redis like that. And then use a database that supports transactions for real data problems.
But we are different. And that's OK.
But that requires running on multiple instances, which in turn requires to share the data across all replicas.
Just because it works for your use case right now doesn’t mean there isn’t room for improvements to support others too.
Oh good, then you don't need to do any of the stuff that you suggested to do
The app would look up in both databases. If it exists in any, there would be a session.
Thisnis strictly different from partitioning which I think you are mixing it up with.
Paritioning is for performance not HA
And if you find the session with differing values in both databases, how do you know which one is up-to-date?
You need an algorithm to pick which data is right, such as electing a master instance.
And that brings us back to the original discussion: to manage sessions (unlike caches) in a highly available way, you need to setup HA (or reimplement it, which obviously is a bad idea). You can't read round robin from multiple non-HA instances.
There is a whole slew of downstream things you need to take into consideration.
Will millions of users, high availability is critical for this functionality.