If there's one thing people, especially HN users, should've learned by now, it's that there's no enforcement mechanism worth a damn for Internet legislation when incentives don't align.
Isn't this basically what content-addressable storage is for? Have the site provide the content hashes rather than the content and then put the content on IPFS/BitTorrent/whatever where the bots can get it from each other instead of bothering the site.
Extra points if you can get popular browsers to implement support for this, since it also makes it a lot harder to censor things and a decent implementation (i.e. one that prefers closer sources/caches) would give most of the internet the efficiency benefits of a CDN without the centralization.
Maybe it'll just be cheaper for CDNs or whatever to sell the data they serve directly instead of doing extra steps with scraping
It's easy to pretend you're human, it's hard to pretend that you have a valid cryptographic signature for Google which attests that your hardware is Google-approved.
Crawling is the price we pay for the web's openness.
They don't modify any device and will pass whatever attestation you try to make.
Not from their single residential IP, they are not.
If they do succeed[1] - it is not going to be at hundreds or thousands of requests per second that the current AI scrapers bombard servers with. Some dude at home will, at best, be putting 4-6 orders of magnitude less strain on a limited set of servers.
1. Scraping is an arms race: if you're just "some dude" at the skill floor - you're going to have a bad time whether you're scraping, or defending against scrapers.
Bloat, and bandwidth costs are the real problems here. Every one seems to have forgotten basics of engineering and accounting.
What happens when the human gives an agent access to said signature? Then you fall back on traditional anti-bot techniques and you're right back where you started.
I joke, but there are those out there who don’t.
Like, 3 orders of magnitude less compute, conservatively counting.
We don't need to attest signals are analogue vs. digital. The world is going to adapt to the use of Gen AI in everything. The future of art, communications, and productivity will all be rooted in these tools.