Having implemented my shared of highly complex high-performance algorithms in the past, the key was always to figure out how to massage the raw data into structures that allow the algorithm to fly. It requires both a decent knowledge of the various algorithm options you have, as well as being flexible to see that the data could be presented a different way to get to the same result orders of magnitude faster.
"Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowchart; it'll be obvious." -- Fred Brooks, The Mythical Man Month (1975)
I think rule 5 is often ignored by a lot of distributed services. Where you have to make several calls, each with their own http, db and "security" overhead, when one would do. Then these each end up with caching layers because they are "slow" (in aggregate).
Very few software services built today are doing it right. Most assume they need to scale from day one, pick a technology stack to enable that, and then alter the product to reflect the limitations of the tech stack they picked. Then they wonder why they need to spend millions on sales and marketing to convince people to use the product they've built, and millions on AWS bills to scale it. But then, the core problem was really that their company did not need to exist in the first place and only does because investors insist on cargo-culting the latest hot thing.
This is why software sucks so much today.
I'll add one more modification if you're like me (and apparently many others): go too far with your distribution and pull it back to a sane (i.e. small handful) number of distributed services, hopefully before you get too far down the implementation...