upvote
The ideal world would be one where, to train on art, you have to buy a license to that art. Sure, for most artists they would maybe put a low price tag, but that isn't the point.

The point isn't about money. It's that copies were made, without license and without permission, and without any legal right to do so, of art, and then used to train a system which generates similar art. The first step, the copy, is illegal without a license, and even for most public images online, licenses and copyright notices (which must be preserved) are attached.

reply
"Without any legal right to do so" is for the courts to decide. And so far, the courts are very much not deciding the way you want them to.

"Fair use" counters "without license and without permission" hard. The argument that training AI on scraped data is "fair use" and the resulting model outputs are "transformative works" has held up in courts. Anthropic got dinged for downloading pirated books, but not for throwing the ones they didn't pirate down the training pipeline.

Some countries, like Japan, have amended their copyright laws to make AI training categorically legal. Others are in "fair use clauses" grey areas with courts deciding case by case based on precedent and interpretation. So trying to latch onto copyright law is, as it always was, the wrong move. Copyright never favored the small guy. Stupid to expect that it suddenly will.

reply
Ideal for whom? For society in general, I don’t think so.
reply
I think you may be placing too much value on the output of these machines which use tons of energy, generate pollution (both noise and chemical), and generate output that's worse then what a human can do. We would be better off if these LLMs didn't exist.
reply
Average person in US reducing his/her meat intake by 1/4 would do much, much more for environment compared with completely scrapping entire AI infrastructure worldwide. For some reason people concerned with environmental impact of AI get really angry whenever I point this out.
reply
The average person here would do more still by just taking one less flight. It's air travel that really blows individual emissions out of the water.
reply
I think it would obviously better for society.
reply
> A counterfactual world where artists were paid for AI training is one where an average artist is 5 cents richer, an average image generation AI performs 5% worse, and the bulk of extra data spending is captured by platforms selling stock photos and companies destructively digitizing physical media.

No, a counterfactual world where artists were paid for AI training wouldn't see commercially viable AI at all. A world which plenty of people would be more than happy to live in, mind you.

AI relies on mass piracy worth Googols of dollars if you count like you would the million dollar iPod, but because AI surprised the copyright industry, it's now too late to enforce copyright like that.

reply
> but because AI surprised the copyright industry, it's now too late to enforce copyright like that.

I think I've got whiplash from the way a lot of the tech scene has gone from 'IP troll outfits are malicious actors who make everything worse for everyone else' to 'IP troll outfits are an ethical and effective solution to exploitation in the AI industry'.

I'm not a huge fan of much of the generative AI industry, but is IP maximalism really the answer here? Before 2022 most of us would have agreed that DRM is generally a scourge for example, and the 'copyright industry' are a big part of pushing for the end of general-purpose computing in favour of DRM-controlled appliances. Personally I'd rather go in the opposite direction, copyright lasts for exactly thirty years and after that a work enters the public domain without exception, and I'd weaken anti-circumvention laws too.

reply
"Copyright" is, frankly, just an excuse people who hate AI latch onto.

Many of the people who rally against AI now used to rally against Napster being prosecuted by RIAA and the Big Mouse renewing copyright expiration dates once again.

It's not that they suddenly gained an appreciation for the copyright law. It's that they found something they hate more than the big record label megacorps - and copyright became a tool they think they can leverage against it. Very stupid, IMO.

reply
Even in a counterfactual world where any data that's not in public domain can't be used in AI training at all, ever, AIs would exist. Training on public domain data is a bitch, but it's doable. It's just that it results in worse AIs for more effort. So no one does it other than to flex.

It would still be "commercially viable", mind. I'm not sure how much would it stall the AI development in practice, but all the inputs of making AIs only get cheaper over time. So I struggle to imagine not having something like DALL-E 1 by 2030.

If we extend the counterfactual and allow for licensed media, we compress the timelines and raise the bar. The "best" image generation AIs of 2026 are now made by the likes of Adobe and locked behind some kind of $500 a month per seat Creative Cloud Pro Future subscription. Because Adobe is rich enough to afford big bulk licensing deals, while the likes of academia and smaller startups have to subsist on old public domain data, permissively licensed scraps and small carefully selected batches of licensed data that might block them from sharing the resulting weights with the licensing deals.

In the "counterfactual: licensed media" world, the local AI generation powerhouse of Stable Diffusion ecosystem probably doesn't exist at all. Big companies selling AI do. Their offerings cost a lot more and perform considerably worse than the actual AIs we have today. So you can't just go to a random website and get an image edited for a shitpost for free. But the high end commercial suites exist, they're used by the media and the marketing companies, and they are still way cheaper than hiring artists. The big copyright companies get their pound of flesh, but don't confuse that for the artists getting a win.

reply
> No, a counterfactual world where artists were paid for AI training wouldn't see commercially viable AI at all. A world which plenty of people would be more than happy to live in, mind you.

You recon Disney and Shutterstock don't have enough images to make commercially viable AI?

Or for that matter, Facebook? Even just for photorealistic images from, you know, all the photos people upload.

> AI relies on mass piracy worth Googols of dollars if you count like you would the million dollar iPod, but because AI surprised the copyright industry, it's now too late to enforce copyright like that.

Not that I disagree that people use everything they can get their hands on for marginal improvements, they obviously do, but the copyright industry being "surprised" is the default state of affairs for infringement, and "piracy" is the wrong word because that's a law and the judges so far have ruled that training isn't itself a copyright offence, while also affirming that it is possible to commit a copyright offence by pirating training data.

reply
If it was about this why do OpenAI and Anthropic lose their minds when people are training off their output or trying to scrape their systems.

I actually don't have an issue with training off the mass of everyones work if the models are open and free to build upon, it's locking them away and then throwing your toys out the pram when people try and do the same thing that bothers me.

reply
If the dataset weren't valuable, big tech wouldn't depend on it to train their models.

I don't care about getting a millionth of a cent as an artist (which btw is a number *you* just pulled out of your imagination). I care about them paying a fair share instead of pocketing it, so the money stays in circulation instead of creating a new class of technofeudal lords.

reply
If it was about this why do OpenAI and Anthropic lose their minds when people are training off their output.
reply
> Pre-training is very much a matter of scale - and scraping is merely the easiest way to get data at scale.

Therein lies the problem. AI firms just bulldozed ahead and "just did it" with no consideration for the ethics or legality. (Nor for that matter, how they're going to get this data in the future now that they're pushing artists into unemployment and filling the internet with slop.)

There is no "imagined counterfactual", people just want AI firms to follow basic ethics and apply consent. Something tech in general is woefully inadequate at.

The counterfactual isn't offered by artists, but AI companies. "If we had to ask consent then we couldn't have made this". Okay, so? The world isn't worse off without OpenAI's image generator. Who cares, there's no economic value to these slop images, they're merely replacing stock assets & quickly thrown together MS paint placeholders.

Given how much of a shitshow this technology has always been (I refuse to mince words: This tech had it's "big break" as "deepfakes", and Elon Musk has escalated that even further. It's always been sexual harassment.) The actual net value to society is almost certainly negative.

reply
deleted
reply