For the use cases outlined in the OP, a 36% performance gain for an optimization that complex would be considered a waste of time. OP was explicitly not talking about code that cares about the performance of its hot path that much. Most applications spend 90% of their runtime waiting for IO anyway, so optimizations of this scale don't do anything.
reply