You had me at "fuzzy", but lost me at "clean up" - because that's what I usually have to do after it went on another wild refactoring spree. It's a stochastic thing, maybe you're lucky and it fuzzy-matches exactly what you want, maybe the distributions lead it astray.
On the line test, I guess it's highly probable that the joke and a few hundred discussions or blog pieces about it were in it's training data.