upvote
> We found our data in the outputs of their models but who can do anything about it...

If the crawlers refuse to voluntarily respect your robots.txt, then you are well within your rights to poison their data.

reply
robots.txt seems like it should be a legally-binding terms of service which would make them outright copyright infringing.

Sue for $180,000 per infringement which should be calculated for each illegal API call.

reply
Was your robots txt written by a lawyer? Does it hold up in the court?
reply
It doesn't matter. Robots.txt is not a license, it's a set of computer parsable directives of how programs should access your site. The actual license doesn't have to be written for computers to parse to be legally binding.

A person should be able to write in a terms of use or license page on their website that says "do not include any content from this website in your AI training data. if you do you will be billed $100 billion dollars." And it should be enforceable. It just turns out that nerds like to say "oh that would be too hard or too expensive, so we're going to ignore it."

reply
Why hasn't your company sued OpenAI and try to argue they're violating the computer abuse and fraud act? Would it really be impossible to argue this?

Unauthorized access, system damage, and maybe even extortion all apply here.

reply
Lawyers can. As long as that data is actually yours I mean, in a strictly legal sense.
reply
I mean, did you check the IPs and make sure they’re from OpenAI? Obviously a fly-by-night AI company is going to set their User Agent to be from a big player.
reply