[0] https://archive.org/details/hisyo00simo/page/n1/mode/2up
I don't think we should "get over" the fact that modern SOTA models couldn't exist without being trained on protected works.
That someone, at some point, paid for.
I'd like to understand why I can't use a song in one of my videos without permission/payment, but an AI company can train models using that song without having either.
I'm not anti-AI. I'd just like to see companies play by the rules everyone else has to follow.
Because training isn't redistribution.
You can also listen to the song and make a new one that sounds similar, just like the AI can.
Answer: They did not. That is literally why there are dozens of ongoing lawsuits in progress.
You're right, it's an unjust situation. And you may note that no one else besides the AI companies has made any progress at all towards changing it.
Copyright will soon die, having outlived its usefulness to society. Whether the knife is held by someone named Stallman or someone named Altman is of little consequence.
Because when you say you are “using” the song, what you mean is that you are distributing copies of the song, which is protected by copyright.
When AI companies train on the song, the model is learning from it. Outside of the rare cases of memorisation, this is not distributing copies and so copyright doesn’t have any say in the matter.
Learning isn’t copying, so copyright doesn’t get involved at all.
The New York Times is suing both OpenAI and Microsoft for copyright infringement. The Authors Guild is suing OpenAI. Getty Images is suing Stability AI. Disney is suing Midjourney. Universal Music Group and Sony have filed suits against multiple AI companies.
> so copyright doesn’t get involved at all.
The dozens of ongoing cases that discredit that statement.
Your objection doesn’t make sense. In the event that an AI company loses a lawsuit for copyright infringement based on simply training on copyrighted works, the answer to you saying you’d like to understand why they can do it and you can’t is simply “your premise is wrong; neither of you can”.
I object to your statement that "copyright doesn’t get involved at all" when that is objectively untrue. If that was true, many of the world's largest companies wouldn't be spending tens of millions of dollars to have that question answered in court. Go to any law-focused forum, and you will find attorneys arguing over these questions.
To train a model using a book, you must first obtain a copy of that book. Did OpenAI purchase a copy of every book not already in the public domain used during training? They did not.
Some of the suits I mentioned claim that OpenAI literally stole copies of books to train its models.
My point is that the copyright question has not been answered. If the NYT, et. al. win, it will be a watershed moment for how AI companies pay for training data moving forward.