Instead, training attempts to sample more heavily from higher quality sources, with, I'm sure, a mix of manual and heuristic labeling.
their idioms would leak occasionally otherwise
At least NYT is probably on the correct side of Sturgeon’s Law: https://en.wikipedia.org/wiki/Sturgeon%27s_law
You may get an inconvenient answer when you ask the question the other way around.