undefined

points

[-]

> ...in the age of AI, does anyone have an actual solution for keeping out bots while preserving the privacy of humans?

There isn't one, and pretending otherwise is nonsense because humans will always provide their credentials to something to act on their behalf.

In the limit you end up with Chinese phone farms.

by tardedmeme3 hours ago|

parent|

[-]

Right. Botnet operators love cloudflare because they make so much money renting out compromised machines to pass their tests.

by thisislife22 hours ago|

prev|

[-]

The only solution is regulation. If all content created by anyone has a copyright, how does an implicit opt-in (which is what happens if you don't create a robots.txt file for your website) for scraping make any sense? Moreover, even if you have a robots.txt, AI (or whatever) bots often don't respect it (or use workarounds - they outsource scraping of such "restricted" sites to unethical third-parties to get the data; Meta has even resorted to piracy, openly!). So clearly, the logic and the "honour system" has failed.

Cloudflare, Google Captcha, HCaptcha etc. are all shitty technical solutions because, as we are all discovering, it comes at the cost of our privacy (i.e. our personal data may monetise these services) and / or our computing resource and time. If current copyright laws aren't sufficient to prevent this, we have to acknowledge the system is broken. The answer could be enhancing it with some kind of Digital Millennium Copyright Act (DMCA) -like laws, but in favour of the creators against BigTech or rogue actors.

- Web-scraping and copyright law - https://www.neudata.co/blog/web-scraping-and-copyright-law

- Why DMCA Claims Against Web Scrapers Face Long Odds - https://capstonedc.com/insights/why-dmca-claims-against-web-...

by oceanplexian2 hours ago|

parent|

[-]

Or you could let information be free, at least the stuff that’s on the public net.

As for issues like bots overloading websites or using too many resources scaling laws will take care of it quickly, it’s not like you can’t serve thousands of RPS from a Raspberry Pi these days.

by ImPostingOnHN2 hours ago|

parent|

prev|

[-]

I don't think regulation will stop web scraping, not least of which because it can be done from locations outside the jurisdiction of the regulations.

> we have to acknowledge the system is broken

The system is broken. It probably takes, what, 10 seconds or less to use a residential or foreign proxy, 6+ months to internationally track and prosecute a single offender? So like a million times more effort going the regulatory route.

by thisislife22 hours ago|

parent|

[-]

Just as criminal laws don't end all crimes, copyright laws and anti-scraping regulation won't end all scraping. But it will greatly reduce it and limit it to rogue actors. Two examples I can cite here are the laws against email spams and laws against unsolicited marketing calls - they had a definite impact in reducing both (even in India, from where I am, where implementation of laws are often lax).

by JoshTriplett2 hours ago|

parent|

[-]

Exactly. Bot activity is a problem of volume, not all-or-nothing. Solving 95% of it would be a win.

by jeroenhd1 hours ago|

prev|

[-]

Remote attestation should still be possible with a rooted phone if phone manufacturers weren't so shit. If the attestation happens at hardware level, it doesn't matter what programs or kernels you're running.

by ravenstine47 minutes ago|

prev|

[-]

Or maybe we can actually start paying for all the things we use on the Web, making it prohibitively expensive to deploy fleets of bots.

by cr125rider3 hours ago|

prev|

[-]

And identifying a bot that is acting on my behalf. Claude go search this topic is basically the same as Googling something and clicking on the results. Human driven AI searching needs to be in a different box than AI scraping for training data.

Which sounds extremely difficult to differentiate

by JoshTriplett2 hours ago|

parent|

[-]

Hopefully it stays that way; "a bot acting on my behalf" is still a bot. At least it's often a well-behaved bot and uses a user-agent that can be detected and blocked.

by Gander57392 hours ago|

prev|

[-]

You don't need a non-rooted phone to pass captcha checks, I have a rooted phone and can pass the captchas that ask you to scan a qr code. But I doubt phones without google services would manage.

by HWR_1423 minutes ago|

parent|

[-]

How does scanning a QR code prove any kind of captcha?

by Gander573917 minutes ago|

parent|

[-]

https://support.google.com/recaptcha/answer/16609652 - it just launches the verification service.

by spacedoutman2 hours ago|

prev|

[-]

Private invite only internets

by csomar2 hours ago|

prev|

[-]

They are not a problem unless you "believe" it is a problem. I estimate around 20-25K hits to my website from bots per day and I have all cloudflare protections disabled. Any decently optimized server should be able to easily handle that. (it's roughly 1 request every 3 seconds).

by specialp2 hours ago|

parent|

[-]

Yes and that is just the bot background radiation of the internet. I run a primary source of information site and these botnets are aggressive to a DDOS level. All to do some sort of scraping. Because they have sophisticated enough tactics to DDOS us if they wanted to. However I am not sure their objective as they have wasted enough of our resources to have scraped all our content 1000s of times over. That 25k traffic is a couple of minutes for us. And that adds up. 80-90pct of our traffic is this

by HWR_1422 minutes ago|

parent|

prev|

[-]

Assuming that the bots aren't repackaging your content and preventing users from seeing your blog by serving that content to them first.

by thisislife22 hours ago|

parent|

prev|

[-]

True. But it still wastes your server resources, right? And it's sad that you have to accept that as part of the "cost" of hosting a site ...

by ndriscoll2 hours ago|

parent|

[-]

What resources are you concerned about? An n100 minipc should be capable of serving something like a blog at 20k+ requests/second (or saturating its network).

by 1 hours ago|

parent|

[-]

deleted

by doctorpangloss2 hours ago|

prev|

[-]

web environment integrity

by malka19863 hours ago|

prev|

[-]

> keeping out bot

You can forget about it. It is not possible. Simple as that.

by Wowfunhappy3 hours ago|

parent|

[-]

Let's say I'm selling concert tickets. How do I prevent bots from buying up all the tickets and scalping them?

by ranguna1 hours ago|

parent|

[-]

Do it like plane tickets do, tie a ticket to an identity + buyback up to a week or so before the concert in case someone wants to cancel (or authorize the transfer and capture only a week before). Ask for ID and ticket at the entrance.

by ndriscoll2 hours ago|

parent|

prev|

[-]

Sell them via a Dutch auction. Eliminate the arbitrage opportunity for scalpers and make more money in the process.

by dcrazy1 hours ago|

parent|

[-]

That’s how you wind up with only kids of millionaires at your Taylor Swift concert.

by queenkjuul8 minutes ago|

parent|

[-]

So a Taylor Swift concert

by MyMemoryfails2 hours ago|

parent|

prev|

[-]

I'd simply check filling speed, even with browser's autocomplete humans are slow due needing click submit.

Then when it's "processing", do them in bulk and prioritize slower users. There's huge opportunity do bot checks after checkout without affecting user experience.

Also on product launches you could add unique field which requires user to input, for example that way bots can't prepare for launches.

by fragmede2 hours ago|

parent|

[-]

huh. no wonder my password manager's auto submit triggers bot detection (it's a fairly popular one).

by luckylion3 hours ago|

parent|

prev|

[-]

Tie them to the buyer's identity, offer at-value buy-backs until X weeks before event, disallow resale.

by ashishbijlani1532 minutes ago|

prev|

[-]

[dead]