upvote
As an artist your license didn't ban learning from your work. Unless your content was acquired without a license at all - you absolutely gave them permission to use it in training sets.

That is the gap in the legal landscape.

reply
No I didn't. It's use in a software product without my permission. That's never been allowed.

Just because you obfuscate what's happening by calling it "learning" and pretending your model is actually just looking at pictures the same as a human, doesn't make it true.

reply
Unfortunately you did grant that permission. Once you granted the permission for someone to hold a copy, they have the permission to process it.

I can assure you, that you didn't grant a license with an exclusive list of operations that can be performed on your work of art. At best you may have had something like "no commercial use" clause and general broad terms.

reply
I thought it was at most a monetary fine, do people go to jail for copyright infringement? But you seem to want to own all the air around your work, the ground beneath it too. Nothing can exist around it, so a creative person would do better to avert their eyes rather than loading useless ideas. Why should I install in my brain your "furniture" when I am not allowed to sit on it? In these cases I think authors provide a net negative to society by creating more works that further forbid others from creating in the same space.

Here, for example, any comment is open to read and respond to. On ArXiv any paper can be downloaded, read and cited. Wikipedia contains text from many thousands of editors, building on each other. We like collaboration more than asserting our exclusivity rights. That is why these places provide better quality than work for direct profit or, God forbid, ad revenue, that is where the slop starts flowing.

reply
>IP laws can stay the same, but they should have purchased a license to use my art before including it in their training data.

But including your art in the training data is fair use (or otherwise exempt) by most standards, as no reproduction occurs. You are advocating for a change to IP law to make it more restrictive.

reply
> But including your art in the training data is fair use

The four factors of fair use in the US:

> the purpose and character of your use

Commercial, for-profit. Not scholarship, not research, not commentary, not parody, etc.

> the nature of the copyrighted work

Absolutely everything. Artistic, creative, not purely factual.

> the amount and substantiality of the portion taken, and

All of it, from everyone.

> the effect of the use upon the potential market.

Directly competing with those whose data was copied.

reply
3 and 4 are what that argument is based on, I believe. 3) on the basis that the output is not _reproduced_, and 4) on similar grounds that output that's just not at all the same as the input data isn't affecting the market for the original image (I think this is the more debatable one, but in general the existing cases have struggled at the early stages because the plaintiffs have not been able to actually point to output that is a copy of their part of the input, and this does actually matter).
reply
> the amount and substantiality of the portion taken, and

> All of it, from everyone.

Yea I'd like to see how drawing two circles violates the copyright of drawing one circle!

reply
Fair use by most standards? Which standards are those? I don't think a standard about training an AI on billions of images exists.
reply
By the same 'transformative' standards that allow satire, reaction and commentary videos to exist. And those take 100% from the source and add context, whereas good generated AI images that aren't wholesale copying take like less than 10% from the original source.

In addition, the idea that you need to pay rent on *your observation* of someone else's work is absurd. No one pays Newton's descendants for making lifts or hosting bungee jump sport activities.

reply
> good generated AI images that aren't wholesale copying take like less than 10% from the original source.

So would the model work if it only trained on the top 10% of pixels in every image? Or do they in fact need the entire image before they begin processing it, and therefore use the entire image?

> In addition, the idea that you need to pay rent on your observation of someone else's work is absurd.

I agree that's absurd. But training a model is no more "observing images" than an F1 car is "walking" down a race track. Just because a race car uses kinetic energy, gravity, and friction to propel itself, the same way a human does, doesn't mean it's doing the same thing as a human. That comparison you're making is the real absurdity.

reply
Is it transformative if I take all the pages in Hanya Yanagiharas A Little Life and use a thesaurus to change every second word?

Or a more realistic scenario: what if I translate it to Spanish without license from the author? That's not allowed, and yet I have "transformed" the work in the same way that an LLM does.

reply
These are my opinions ofc.

> Is it transformative if I take all the pages in Hanya Yanagiharas A Little Life and use a thesaurus to change every second word?

If you meant it literally.. I'd think that such a version would be a sort of parody. It'd be up to lawyers doing their cross-examinations to prove the work was intended for such a purpose though..

> Or a more realistic scenario: what if I translate it to Spanish without license from the author? That's not allowed, and yet I have "transformed" the work in the same way that an LLM does.

Probably a lawyer would answer this better than me, but the 'content' is the same and would violate copyright. There's also other factors, like if it was translated/distributed for free.

Besides that I regard that LLMs to hold mathematical observations in contrast to a translated work. So long as the user ensures the output isn't close to what's already available imo it fits the transformative criteria.

reply
No precedent has been set when it comes to training and fair use
reply
Which case decided that?
reply
> But including your art in the training data is fair use

It shouldn't be!

reply