upvote
And the judgement said that the training was fair use, but that the duplication might be an infringement. The GFDL doesn't restrict duplication, only distribution, so if training on GFDLed material is fair use and not the creation of a derivative work then there's no damage.
reply
> The GFDL doesn't restrict duplication

Right. I can publish the work in whole without asking permission. That’s unrestricted duplication.

However, as i read it, an LLM spitting out snippets from the text is not “duplicating” the work. That would fall under modifications. From the license:

> A "Modified Version" of the Document means any work containing the Document or a portion of it, either copied verbatim, or with modifications and/or translated into another language.

I read that pretty clearly as any work containing text from a gnu fdl document is a modification not a duplication.

reply
There's three steps here:

1) Obtaining the copyrighted works used for training. Anthropic did this without asking for the copyright holders' permission, which would be a copyright violation for any work that isn't under a license that grants permission to duplicate. The GFDL does, so no issue here. 2) Training the model. The case held that this was fair use, so no issue here. 3) Whether the output is a derivative work. If so then you get to figure out how the GFDL applies to the output, but to the best of my knowledge the case didn't ask this question so we don't know.

reply
Last time I checked online LLMs distribute parts of their training corpus when you prompt them.
reply
For this to stand up in court you'd need to show that an LLM is distributing "a modified version of the document".

If I took a book and cut it up into individual words (or partial words even), and then used some of the words with words from every other book to write a new book, it'd be hard to argue that I'm really "distributing the first book", even if the subject of my book is the same as the first one.

This really just highlights how the law is a long way behind what's achievable with modern computing power.

reply
You’re just describing transformative use. I’m not a lawyer, but an example from music - taking a single drum hit from a james brown song is apparently not transformative. Taking a vibe from another song is also maybe not transformative, e.g. robin thicke and pharrell’s “blurred lines” was found to legally take the “feel” from Marvin Gaye’s “Got to Give it Up”

Which is all to say that the law is actually really bad at determining what is right and wrong, and our moral compasses should not defer to the law. Unfortunately, moral compasses are often skewed by money - like how normal compassess are skewed by magnets

reply
Presumably, a suitable prompt could get the LLM to produce whole sections of the book which would demonstrate that the LLM contains a modified version.
reply
Yes, and for practical purposes the current consensus (and in case of EU, the law) is that only said document would be converted by FDL
reply
I am distrubting an svg file. It’s a program that, when run, produces an image of mickey mouse.

By your description of the law, this svg file is not infringing on disney’s copyright - since it’s a program that when run creates an infringing document (the rasterized pixels of mickey mouse) but it is not an infringing document itself.

I really don’t think my “i wrote a program in the svg language” defense would hold up in court. But i wonder how many levels of abstraction before it’s legal? Like if i write the mickey-mouse-generator in python does that make it legal? If it generates a variety of randomized images of mickey mouse, is that legal? If it uses statistical anaylsis of many drawings of mickey to generate an average mickey mouse, is that legal? Does it have to generate different characters if asked before it is legal? Can that be an if statement or does it have to use statistical calculations to decide what character i want?

reply