undefined

points

[-]

Unintentional denial-of-service attacks from AI scrapers are definitely a problem, I just don't know if "theft" is the right way to classify them. They shouldn't get lumped in with intellectual property concerns, which are a different matter. AI scrapers are a tragedy of the commons problem kind of like Kessler syndrome: a few bad actors can ruin low Earth orbit for everyone via space pollution, which is definitely a problem, but saying that they "stole" LEO from humanity doesn't feel like the right terminology. Maybe the problem with AI scrapers could be better described as "bandwidth pollution" or "network overfishing" or something.

by oasisbob2 hours ago|

parent|

[-]

Theft isn't far off, it seems closer to me than using the word for IP violations.

When a crawler aggressively crawls your site, they're permanently depriving you the use of those resources for their intended purpose. Arguably, it looks a lot like conversion.

by jareklupinski52 minutes ago|

parent|

[-]

> Arguably, it looks a lot like conversion.

is this why media networks are buying social ai apps

by margalabargala6 hours ago|

parent|

prev|

[-]

Yes I completely agree.

by FeepingCreature6 hours ago|

parent|

prev|

[-]

you're totally right about not being theft, but we have a term. you used it yourself, "distributed denial of service". that's all it is. these crawlers should be kicked off the internet for abuse. people should contact the isp of origin.

by ethmarks6 hours ago|

parent|

[-]

Firstly, since this argument is about semantic pedantry anyways, it's just denial-of-service, not distributed denial-of-service. AI scraper requests come from centralized servers, not a botnet.

Secondly, denial-of-service implies intentionality and malice that I don't think is present from AI scrapers. They cause huge problems, but only as a negligent byproduct of other goals. I think that the tragedy of the commons framing is more accurate.

EDIT: my first point was arguably incorrect because some scrapers do use decentralized infrastructure and my second point was clearly incorrect because "denial-of-service" describes the effect, not the intention. I retract both points and apologize.

by goodmythical2 hours ago|

parent|

[-]

ah, no fun, I was going to continue the semantic deconstruction with a whole bunch of technicalities about how you're not quite precisely accurate and you gotta go do the right thing and retract your statements.

boo. took all the fun out of it ;)

by FeepingCreature5 hours ago|

parent|

prev|

[-]

Sufficiently advanced negligence is indistinguishable from malice. There is a point you no longer gain anything from treating them differently.

by cdrini5 hours ago|

parent|

prev|

[-]

The first is incorrect, these scrapers are usually distributed across many IPs, in my experience. I usually refer to them as "disturbed, non-identifying crawlers (DNCs)" when I want to be maximally explicit. (The worst I've seen is some crawler/botnet making exactly one request per IP -_-)

by aduwah5 hours ago|

parent|

[-]

I think the second is incorrect too. DDoS is a DDoS no matter what the intent is.

by cdrini1 hours ago|

parent|

[-]

I think one could argue that one. Is a DDoS a symptom? In which case the intent is irrelevant. Or is a DDoS an attack/crime? In which case it is. We kind of use it to mean both. But I think it's generally the latter. Wikipedia describes it as a "cyberattack", so actually I think intent is relevant to our (society's) current definition.

by ethmarks3 minutes ago|

parent|

[-]

The semantics that make sense to me is that "DDoS" describes the symptom/effect irrespective of intent, and "DDoS attack" describes the malicious crime. But the terms are frequently used interchangeably.

by pmlnr5 hours ago|

prev|

[-]

Been there recently. Rate limit on nginx and anti-syn flood on pf solved it.

by spiderfarmer3 hours ago|

parent|

[-]

I'm being hit with 300 req/s 24/7 from hundreds of thousands of unique IP's from residential proxies. I can't rate limit any further without hurting the real users.

by oasisbob2 hours ago|

parent|

[-]

Yeah, IP-based rate limits are nearly ineffective these days.