Google has further complicated it with new search announcement blurring lines between regular search and AI search. And AI likes to not honor any licenses or instructions when it is hungry for training material.
It is once again an example of Google using its dominant position to abuse and promote cross functional products.
That doesn't work anymore. Google provides AI generated summary, nobody looks at the original site.
We found our data in the outputs of their models but who can do anything about it...
If the crawlers refuse to voluntarily respect your robots.txt, then you are well within your rights to poison their data.
Sue for $180,000 per infringement which should be calculated for each illegal API call.
A person should be able to write in a terms of use or license page on their website that says "do not include any content from this website in your AI training data. if you do you will be billed $100 billion dollars." And it should be enforceable. It just turns out that nerds like to say "oh that would be too hard or too expensive, so we're going to ignore it."
Unauthorized access, system damage, and maybe even extortion all apply here.
well, at least in the case of google, I'm pretty sure that's the point. Or at least, they are doing things that would seem to be moving towards being an oracle with all the answers and not the signpost that points you in the right direction. The destination rather than the gateway.
These AI companies are really just a gross example of the motto "Socialize the costs, privatise the profits". It's disgusting!
I know this has repercussions on findability, but if that wasn't a concern, I'm curious how one might circumvent getting crawled.
Most legit search engines are going to honor robots.txt and you can disallow access.
Next level would be using something like rate limiting controls and/or Cloudflare's bot fight mode to start blocking the bad bots. You start to annoy some people here.
Next would be putting the content behind some form of auth.
https://developers.cloudflare.com/browser-run/quick-actions/...
Even when we do actually put physical locks on things they are mostly there to show that someone breaking in did so intentionally and not at all designed to prevent motivated attackers.
Where do you live? In the US it’s actually illegal for anyone except the USPS to deliver to a mailbox.
Also this has gotten pretty far away from the web scraping scenario. There’s no door accidentally opening here.
That being said you would require your user to download a compatible browser for gemini/gopher.
We've been celebrating denying creators revenue for decades...
Maybe this is just the internet hypocricy of "When I do it, it's good, when they do it, it's bad".
Ad blocking has always been a problem for creators but it's aimed at big corps - non-creators. The creators asked people to support them other ways or turn off the blocking. And it's not like the little independent creators wanted this version of commercialized internet in the first place.
The ai marketing teams are spinning everything they can but no AI companies are the conscript, the vultures. No question about it.
The number of people who will not ever load your ads is around 30%.
I can tell you that creators talk about this a lot in private, but will not publicly because the internet has a mass delusion on how creation and compensation works. It's like trying to convince christians that jesus obviously didn't come back from the dead days later, depsite there being no logical system available that would explain it.
If we were to try and map out a functional internet where everyone wins, users and creators, there is no example where ad blocking is anything other net harmful. You either get volunteer net where 0.01% share hobby posts on their own dime for the other 99.9% or you get IRC where 99% of the population doesn't really benefit (ala 1993).
People can easily justify their own piracy because it’s small scale. Even when they organize, create a whole software and tooling ecosystem around pirating media to stick into jellyfin or plex. AI still did it bigger and worse and is bad, what I’m doing is not so bad because I wasn’t going to buy the movie anyway, etc.
It's in no way, shape, or form "small scale", and has fundamentally changed the the very nature of the internet for the worse (opinions/views of ad blocking people don't matter).
There is no viable model where "have stuff but not pay for it" works out.
Many of the websites I read do not collect any appreciable amount of money from ads, or have no ads at all (one example: news.ycombinator.com :) ). They want a recognition, or to share the knowledge, or community, or they are building their brand... And AI is destroying this all - the first result of "zx80" is an AI overview with a link to wikipedia and some youtube videos. If person stops there , they will never get to computinghistory.org.uk link, and won't see any related information about the variants and models.
When you click "news.ycombinator.com" you are clicking on the ad.
:)
> Although Anubis could be altered to mine cryptocurrency to serve as proof of work, Iaso has rejected this idea: "I don't want to touch cryptocurrency with a 20 foot pole."
Which in my mind is a shame. Crypto is an absolute mess, yes, but this seems like an elegant way to get something back for putting things out there.
This is the problem crypto fans refuse to acknowledge. The money doesn't magically appear, you're taking it from someone else and letting them hold the bag when whatever cryptocurrency you choose inevitably blows up, fails, or rug-pulls. It's unethical to engage with at all because you're still participating in scamming real money out of private individuals
Between seeing ads and doing a little bit of proof-of-work for the author, I'd choose the latter.
What's even crazier to think about is that to use the latest versions of these models for which you supplied training data, you have to pay hundreds of dollars a month. I would love to get a settlement check proportional to my model weights. Even if it's $0.10, at least everyone out there will get what they're owed.
I do not value copyright. All it does is give you standing to sue if somebody reproduces your work. It does not differentiate or account for parallel creation. I cannot count how many times I have "created" something, only to find it in a research paper later.
Part of the reason I think copyright has no value is that, in general, individual copyright owners don't have the deep pockets necessary to sue someone who violates their copyright. If anyone is violating the spirit of copyright, it's corporations that insist you assign your work over to them as a work for hire, or outright ignore your copyright. (looking at you, Disney's Atlantis).
A significant benefit of AI that doesn't get talked about enough is that AI has a much greater reach over all the information it was trained on and can draw connections that would be invisible to someone operating at the human scale.
Today you can put a coding agent to migrate an existing application to another language (like chardet). Even if you don't have the code, if you can run the app you can still clone it, using it as an oracle for replication. That is why there will be very little profits in AI usage.