Comparative benchmarks are tricky. I considered making a simpler single-pass inference branch which is based around the same data structures to create a more one-to-one comparison. But this algorithm is rather different so porting it wasn't very straight forward. I'm currently integrating this into my real compiler so from there it'll be easier to estimate the real-world performance impact of this system.
Typo has been fixed, thanks!