The four factors of fair use in the US:
> the purpose and character of your use
Commercial, for-profit. Not scholarship, not research, not commentary, not parody, etc.
> the nature of the copyrighted work
Absolutely everything. Artistic, creative, not purely factual.
> the amount and substantiality of the portion taken, and
All of it, from everyone.
> the effect of the use upon the potential market.
Directly competing with those whose data was copied.
> All of it, from everyone.
Yea I'd like to see how drawing two circles violates the copyright of drawing one circle!
In addition, the idea that you need to pay rent on *your observation* of someone else's work is absurd. No one pays Newton's descendants for making lifts or hosting bungee jump sport activities.
So would the model work if it only trained on the top 10% of pixels in every image? Or do they in fact need the entire image before they begin processing it, and therefore use the entire image?
> In addition, the idea that you need to pay rent on your observation of someone else's work is absurd.
I agree that's absurd. But training a model is no more "observing images" than an F1 car is "walking" down a race track. Just because a race car uses kinetic energy, gravity, and friction to propel itself, the same way a human does, doesn't mean it's doing the same thing as a human. That comparison you're making is the real absurdity.
Or a more realistic scenario: what if I translate it to Spanish without license from the author? That's not allowed, and yet I have "transformed" the work in the same way that an LLM does.
> Is it transformative if I take all the pages in Hanya Yanagiharas A Little Life and use a thesaurus to change every second word?
If you meant it literally.. I'd think that such a version would be a sort of parody. It'd be up to lawyers doing their cross-examinations to prove the work was intended for such a purpose though..
> Or a more realistic scenario: what if I translate it to Spanish without license from the author? That's not allowed, and yet I have "transformed" the work in the same way that an LLM does.
Probably a lawyer would answer this better than me, but the 'content' is the same and would violate copyright. There's also other factors, like if it was translated/distributed for free.
Besides that I regard that LLMs to hold mathematical observations in contrast to a translated work. So long as the user ensures the output isn't close to what's already available imo it fits the transformative criteria.
It shouldn't be!