upvote
I'll tell you what I expect to see from crawlers, agents and which I'm enforcing on everybody who doesn't look distinctly human:

* Reverse DNS which points to a web site which has a discoverable / well-known page which clearly describes their behavior.

* Some sort of reverse IP based, RBL and SPF -inspired TXT records which describe who, what, when, why, how, how often

so that I can make automated decisions based on it.

Yah, I don't have a lot of crawlers that I welcome... but I'm building a pretty good database of the worst offenders. At scale... there are advantages to scale which work in my favor, actually.

I documented this at the end of a blog post when I made blocking Amazon incoming requests a default policy several years ago.

reply