"Anthropic settles with authors in first-of-its-kind AI copyright infringement lawsuit" - https://www.npr.org/2025/09/05/nx-s1-5529404/anthropic-settl...
> However, the judge ruled that Anthropic's use of millions of pirated books to build its models – books that websites such as Library Genesis (LibGen) and Pirate Library Mirror (PiLiMi) copied without getting the authors' consent or giving them compensation – was not.
It seems clear from the article that while the use of pirated works was illegal, the use of copyrighted works (a the work a book is based on is still copyrighted if you buy the book) was fine and transformative.
Does that make my brain copyright infringement? Does Disney now own all my output forever because some small part of me now has Harry Potter embedded?
If you just ignore anything that's inconvenient for your argument, you can make any argument you want.
None of those are relevant factors when it comes to copyright law. You don't get a pass for copyright infringement just because you're not copying the entire work. Same goes for a copy that's transient. You can't set up a bootleg movie theater in your home, even if you delete the movie file afterwards, and there's no trace of the movie aside from the viewers' vague memories.
And yet they very much are. US copyright law has the concept of "fair use" in 17 U.S. Code § 107 [0]. I'll paste here for your benefit, #3 is the one I referenced as most obvious but #1 and #4 are also very relevant:
(1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
(2) the nature of the copyrighted work;
(3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
(4) the effect of the use upon the potential market for or value of the copyrighted work.
Naturally remembering some parts of a legally purchased book verbatim is fair use. "Memorizing" the entire library obtained via torrents and incorporating that in a commercial product that can output all that content doesn't sound like fair use to me.The US justice system is too captured and corrupt at this point to take as reference because decisions there are bought by the highest bidder. But for the purpose of this discussion let's not play dumb for the benefit of trillion dollar corporations.
If you're going to invoke fair use, that opens up a whole can of worms on what counts as transformative. The google books case and the google thumbnails case shows that you can make near verbatim copies of works at scale and still be considered fair use.
>The US justice system is too captured and corrupt at this point to take as reference because decisions there are bought by the highest bidder. But for the purpose of this discussion let's not play dumb for the benefit of trillion dollar corporations.
This is begging the question. The original question is whether ai companies are getting special treatment. You can't then use that as a premise to say that the courts are tilted towards ai companies. Not to mention it's questionable how ai companies were suddenly able to corrupt all the judges, some of which were appointed decades ago, even though they only got rich a couple of years ago.
No, and neither do LLM's. They're trained on vast quantities of data and retain only a fraction of it.
You might think of it as very, very lossy compression that generates new outputs rather than the original input unless something unintentional happens.
> If you just ignore anything that's inconvenient for your argument, you can make any argument you want.
I'm not. I just understand how it actually works. You either don't understand or are deliberately ignoring that what you just said is literally and technically untrue to make some sort of political statement.
Somewhere between the two a line must be drawn… where we’d want to put that line, I guess, if up for quibbling. But it doesn’t seem obvious to me.
The google books and google thumbnails cases have so far upheld that even mechanical reproductions are allowed, depending on the context/usage.
Sometimes they go a bit wonky and overtrain on specific phrases which can result in verbatim copies of brief sections of coontent. Thats a bug, not a feature.
Humans reading or watching copyrighted material isn't considered "making a copy" for the purposes of copyright law. Machines doing so generally is.
if you prompt it to, yes. just like your browser dutifully navigates to any copyright-infringing resource and GETs and POSTs whatever you ask of it.
(also it can't, not really, only small snippets before going off rails. LLMs aren't magic, they can't losslessly compress an exabyte of training data into a few terabytes of weights.)
As for your "technically not copyright infringement" defense. Those laws are from a time when those patterns couldnt be derived and dostributed at scale. A human had to learn and teach them. That made it different. The scale enabled my modern tech makes it a whole dofferent situation. The same way how one person standing a street corner people watching for a bit isnt that bad, but a whole constellation of flock cameras costantly montioring everyones movements and making it available to any of their customers is really really bad. The law will have to catch up to this
Nos for the same reason that me giving you a word cloud of the frequency of words within Harry Potter isn’t infringement. It’s a novel transformation.
If so, why do we still pay for games and movies?
this is an incorrect interpretation (in the usa, at least).
downloading a game/movie is still the creation of unauthorized copy, which is not allowed. not to mention that playing/watching does not count as a "novel transformation".
(17 U.S.C. § 106 and 17 U.S.C. § 501 are the relevant pieces of reading)
So if you pirate a bunch of content you still get in trouble for that. But if you somehow make a business out of that that isn’t just redistributing those materials, then that business itself isn’t infringing.
ISPs and trigger-happy law firms don't send you a C&D for downloading a torrent, they do so for seeding a torrent. It's just that practically nobody "just seeds" a torrent so people colloquially claim they got busted for downloading a torrent.
In theory this means if you torrent as a 100% leecher and turn off seeding from the get-go, you should be in the clear. But nobody sensible would dare test the extent of German Legal Spite, much less do so repeatedly to science the shit out of it.
If you can download through another protocol, say HTTP, however---<Sendung unterbrochen!>