upvote
Where does one find a good robots.txt? Are there any well maintained out there?
reply
Cloudflare actually has this as a free tier feature so even if you don't want to use it for your site you can just setup a throwaway domain on Cloudflare and periodically copy the robots.txt they generate from your scraper allow/block preferences, since they'll be keeping up to date with all the latest.
reply
I will second a good robots.txt. Just checked my metrics and < 100 requests total to my git instance in the last 48 hours. Completely public, most repos are behind a login but there are a couple that are public and linked.
reply
> I wonder if these folks actually tried a good robots.txt?

I suspect that some of these folks are not interested in a proper solution. Being able to vaguely claim that the AI boogeyman is oppressing us has turned into quite the pastime.

reply
> Being able to vaguely claim that the AI boogeyman is oppressing us has turned into quite the pastime.

FWIW, you're literally in a comment thread where GP (me!) says "don't understand what the big issue is"...

reply