upvote
While very powerful, I think it's worth calling out some pitfuls. A few things we've ran into - long lived feature flags that are never cleaned up (which usually cause zombie or partially dead code) - rollout drift where different environments or customers have different flags set and it's difficult to know who actually has the feature - not flagging all connected functionality (i.e. one API is missing the flag that should have had it)

A good decom/cleanup strategy definitely helps

reply
Have them emit metrics when it's triggered. You can do a bulk "names X, Y, Z haven't used branch B in >30 days, delete?" task generator pretty easily. Un-triggered ones are also easy to catch if you force all calls to be grep-friendly (or similar), which is also an easy lint to write: unclear result? Block it, force `flag("inline constant", ...)`.

Personally I've also had a lot of success requiring "expiration" dates for all flags, and when passed they emit a highly visible warning metric. You can always just bump it another month to defer it, but people eventually get sick of doing that and clean it up so it'll go away for good. Make it mildly annoying, so the cleanup is an improvement, and it happens pretty automatically.

reply
Yep, archiving feature flags and deleting the dead code is usually thing number 9001 on the list of priorities, so in practice most projects end up with a graveyard of them.

Another issue that I've ran into a few times, is if a feature flag starts as a simple thing, but as new features get added, it evolves into a complex bifurcation of logic and many code paths become dependent on it, which can add crippling complexity to what you're developing

reply
Feature flags are like bloom filters. They make 98 out of 100 situations better and they make the other 2 worse. When performance is the issue that’s usually fine. When reliability is the issue, that’s not sufficient.

If you work on fifty feature toggles a year, one of them is going to go wrong. If your team is doing a few hundred, you’re gonna have oopsies.

Most of the problematic cases are where the code is set up so that the old path and the new one can’t bypass each other cleanly. They get tangled up and maybe the toggle gets implemented inverted where it’s difficult to remove the old path without breaking the new.

reply
You can go even further with something like the gem scientist at the application level, or tee-testing at the data store level. Compare A and A', record the result, and return A. Eventually, you reach 100% compatibility between the two (or only deviations that are desirable) and can remove A, leaving only A'

I also like recording and replaying production traffic, as well, so that you can do your tee-testing in an environment that doesn't affect latency for production, but that's not quite the same thing.

reply
You’ve just resolved a problem I had. I had this problem on a search engine, but I made it as a “v2”. And I told customers to switch to v2. And you know the v2 problem: Discrepancies that customers like. So both versions have fans, but we really need to pull the plug on v1. You’ve just solved it: I should have indexed even records with v1, odd records with v2. Then only I would know which engine was used.
reply