* Reverse DNS which points to a web site which has a discoverable / well-known page which clearly describes their behavior.
* Some sort of reverse IP based, RBL and SPF -inspired TXT records which describe who, what, when, why, how, how often
so that I can make automated decisions based on it.
Yah, I don't have a lot of crawlers that I welcome... but I'm building a pretty good database of the worst offenders. At scale... there are advantages to scale which work in my favor, actually.
I documented this at the end of a blog post when I made blocking Amazon incoming requests a default policy several years ago.