upvote
Because the scraper is either impatient, careless or indifferent; and if they scrape for training data they don't plan to come back. If they don't plan to come back they don't care if you tighten up crawling protections after they have moved on. In fact they are probably happy that they got their data and their competition won't
reply
> they don't plan to come back

To me the current behavior of those scrapers tells me that "they don't plan", period.

Looks like they hired a bunch of excavators and are digging 2 meters deep on whole fields, looking for nuggets of gold, and pilling the dirt on a huge mountain.

Once they realize the field was bereft of any gold but full of silver? Or that the gold was actually 2.5 meters deep?

They have to go through everything again.

reply
The number of git forges behind Anubis et al and the numerous public announcements should be enough.

Scrappers seem to be exceedingly careless in using public resources. The problem is often not even DDOS (as in overwhelming bandwidth usage) but rather DOS through excessive hits on expensive routes.

reply
> Ask yourself, why would a scraper ddos?

Don't need to ask anything i can tell you exactly - because they have no regard for anything but their own profit.

Let me give you an example of this mom and pop shop known as anthropic.

You see they have this thing called claudebot and at least initially it scraped iterating through IP's.

Now you have these things called shared hosting servers, typically running 1000-10000 domains of actual low volume websites on 1-50 or so IPs.

Guess what happens when it is your networks time to bend over? Whole hosting company infrastructure going down as each server has hundreds of claudebots crawling hundreds of vhosts at the same time.

This happened for months. Its the reason they are banned in WAFs by half the hosting industry.

reply
Ask yourself, why would everyone except you say that they do?
reply