upvote
The problem is stats can actually do more with all the data including obvious errors. If you start filtering out data where they miss entered lat log you might introduce a new bias.
reply
Sure we should indeed expect that they do that. But look at enough data and you'll learn that those expectations are a path towards never-ending frustration. I've been there, spending >100 hours cleaning data... that never got published because I was too damn focused on the dozens of years of errors that many, many people created.

To be clear, I'm not saying that we should accept messy data. Just, reality is messy and it's naive to think we can catch and remove all of reality's messiness -- which includes the bureaucratic slop that led to the data being published in the first place.

reply