A good decom/cleanup strategy definitely helps
Personally I've also had a lot of success requiring "expiration" dates for all flags, and when passed they emit a highly visible warning metric. You can always just bump it another month to defer it, but people eventually get sick of doing that and clean it up so it'll go away for good. Make it mildly annoying, so the cleanup is an improvement, and it happens pretty automatically.
Another issue that I've ran into a few times, is if a feature flag starts as a simple thing, but as new features get added, it evolves into a complex bifurcation of logic and many code paths become dependent on it, which can add crippling complexity to what you're developing
If you work on fifty feature toggles a year, one of them is going to go wrong. If your team is doing a few hundred, you’re gonna have oopsies.
Most of the problematic cases are where the code is set up so that the old path and the new one can’t bypass each other cleanly. They get tangled up and maybe the toggle gets implemented inverted where it’s difficult to remove the old path without breaking the new.
I also like recording and replaying production traffic, as well, so that you can do your tee-testing in an environment that doesn't affect latency for production, but that's not quite the same thing.