Edit: haven't followed the law in a while, but you could definitely copy, digitalize and scan documents for yourself and your friends (copia privada).
There is a reason why we call it styles, because it’s a recognizable pattern someone came up with maybe after decades of work.
You don't even need to have a legally acquired source material to produce work in a certain style.
The new reality allows for original creators to actually track the chain, so we're in this situation.
If one or two people take an apple from your tree it’s not a big deal, if a machine takes 10,000 it is.
So if scan a book you are making a copy. In some copyright jurisdictions this is allowed for individuals under a private copying exception - a copyright opt out, if you like - but the important thing is private use. In some jurisdictions there is also a fair use exception, which allows you to exploit the rights protected by copyright under certain circumstances, but fair use is quite specific and one big issue with fair use is that the rights you are exploiting cannot result in something that competes with the original work.
Other acts restricted by copyright include distribution, adaptation, performance, communication and rental.
So if you copy a book, digitize it, and write a program to analyze the word frequencies it contains you may, in some jurisdictions but not all, be allowed to do this.
If you’re doing it locally on your own machine you are simply copying it. If you do it in the cloud you are copying it and communicating the copy. If you copy it, analyze it and train an AI model on it that could be considered fair use in certain jurisdictions. Whether the outputs are adaptations of the training data is a matter of debate in the copyright community.
But importantly if you commercialise that model and the resulting outputs compete with the copyright protected material you used to train, your fair use argument may fail.
So when you buy a book you are actually party to what is effectively a licence granted by the copyright holder, albeit it to the publisher. But as the end user of the book you are still restricted in what you can do with that copyright protected work, through a universal end user licence encoded in law.
And I think same could happen to LLM. If it took all the fossil fuel on Earth just to barely able to drive a car to a car wash, there's more things wrong with the car than in the oil price.
Where did you get that idea. Global economy is ~200T/year PPP. 0.1% of that split across every artist you want the training data from would be insanely difficult for the vast majority of them to turn down. Which makes sense as art isn’t that big a percentage of the global economy compared to say housing, food, medical care, infrastructure, military spending etc.
Obviously the incentive to take without compensation is far more appealing, but that doesn’t mean it was impossible to make a reasonable offer.
That's kind of an interesting concept: "since the scale of my transgression was so big, I should get away with it scot-free."