The people you say are getting "shafted" always got shafted. Their works are the inspiration for all artists and people who lay their eyes on it - maybe they got paid when they made the work, maybe they managed to sell it, but probably not. And still, other artists (and machines) will use remember and be inspired by it, sometimes to the point of verbatim copy (which is extremely common for human artists as well, with verbatim copy and replication being an actual sought after skill).
(Those about to shout "LICENSING", that's a very new invention and we're terrible at it. What are you going to do, cut out the part of your brain that formed new connections while touching GPL code?)
The person (singular) that is actually getting "shafted" at each use is the artist you didn't hire to do the job of making your new work, because it is their skill that got replaced. A skill build from a lifetime of studying other art and practicing themselves, replaced with a skill build from a machine studying other art and by virtue of some closed loops likely also "practicing" itself.
Still, shafting at large, but the obsession with training data is misplaced in that it entirely ignores how society and art worked beforehand.
At the same time, for most of the things you're likely using the tool for, there would probably would never have been an artist in the first place. For example, if you're just making your powerpoint prettier, or if your commission is ridiculous as it often is and yet only willing to offer a single-digit dollar sum per work which no artist should take (RIP the poor souls that take such work anyway).
It will be true no matter who many bribes those who have never created anything pay to Marsha Blackburn (who miraculously reversed her AI skepticism).
I wonder how many threats of being primaried have been issued by the uncreative technocrat thieves.
What makes the dataset valuable isn't that the image 0012992 in it is precious and irreplaceable. It's that the index goes to seven digits. Pre-training is very much a matter of scale - and scraping is merely the easiest way to get data at scale.
People who complain about "artists not getting paid" must have in their imagination some kind of counterfactual where artists are being paid thousands for their contributions. That's not how it works. A counterfactual world where artists were paid for AI training is one where an average artist is 5 cents richer, an average image generation AI performs 5% worse, and the bulk of extra data spending is captured by platforms selling stock photos and companies destructively digitizing physical media.
No, a counterfactual world where artists were paid for AI training wouldn't see commercially viable AI at all. A world which plenty of people would be more than happy to live in, mind you.
AI relies on mass piracy worth Googols of dollars if you count like you would the million dollar iPod, but because AI surprised the copyright industry, it's now too late to enforce copyright like that.
The point isn't about money. It's that copies were made, without license and without permission, and without any legal right to do so, of art, and then used to train a system which generates similar art. The first step, the copy, is illegal without a license, and even for most public images online, licenses and copyright notices (which must be preserved) are attached.
"Fair use" counters "without license and without permission" hard. The argument that training AI on scraped data is "fair use" and the resulting model outputs are "transformative works" has held up in courts. Anthropic got dinged for downloading pirated books, but not for throwing the ones they didn't pirate down the training pipeline.
Some countries, like Japan, have amended their copyright laws to make AI training categorically legal. Others are in "fair use clauses" grey areas with courts deciding case by case based on precedent and interpretation. So trying to latch onto copyright law is, as it always was, the wrong move. Copyright never favored the small guy. Stupid to expect that it suddenly will.
I don't care about getting a millionth of a cent as an artist (which btw is a number *you* just pulled out of your imagination). I care about them paying a fair share instead of pocketing it, so the money stays in circulation instead of creating a new class of technofeudal lords.
Therein lies the problem. AI firms just bulldozed ahead and "just did it" with no consideration for the ethics or legality. (Nor for that matter, how they're going to get this data in the future now that they're pushing artists into unemployment and filling the internet with slop.)
There is no "imagined counterfactual", people just want AI firms to follow basic ethics and apply consent. Something tech in general is woefully inadequate at.
The counterfactual isn't offered by artists, but AI companies. "If we had to ask consent then we couldn't have made this". Okay, so? The world isn't worse off without OpenAI's image generator. Who cares, there's no economic value to these slop images, they're merely replacing stock assets & quickly thrown together MS paint placeholders.
Given how much of a shitshow this technology has always been (I refuse to mince words: This tech had it's "big break" as "deepfakes", and Elon Musk has escalated that even further. It's always been sexual harassment.) The actual net value to society is almost certainly negative.
Potentially the one difference is that developers invented this and screwed themselves, whereas artists had nothing to do with AI.
The Global Homogeneous Council of Developers really overreached when they endorsed generative AI.
Customers usually can figure out when a product is shitty software, but shitty art, well that's a bit harder for people to judge.
- Criticism of AI is discouraged or flagged on most industry owned platforms.
- The loudest pro-AI software engineers work for companies that financially benefit from AI.
- Many are silent because they fear reprisals.
- Many software engineers lack agency and prefer to sit back and understand what is happening instead of shaping what is happening.
- Many software engineers are politically naive and easily exploited.
Artists have a broader view and are often not employed by the perpetrators of the theft.
What causes comments to disappear? Is that what flagging does?
- "Artists have always been exploited" (patently false since at least 1950, it was a symbiosis with the industry).
- "Humans have always done $X".
- "You are a Luddite."
- "This is inevitable."
Art can't be generated. We can only generate artefacts mimicking art styles. So far we have no AI generated images that are considered actual Art, because Art's purpose is to express the artist's intent. And when there is no artist, there is no intent.
I have to stop now, but I guess you can see where I'm going with this.
Art is not just about beauty, it is about expressing the mind (feelings, experience etc) of the author. AI will never do that (except if it learns to express its own experiences, which would be art, but not something competing with human art; it would be like if we had contact with alien art).
Code can be art the same way writing can be. There's a big difference between artistic code and business code, the same way there's a big difference between poetry and a comment chain on hacker news.
Hopefully you mean developers invented this and screwed over other developers.
How many folks working on the code at OpenAI have meaninfully contributed to Open Source? I agree that because it is the same "job title" people might feel less sympathy but it's not the same people.
Your comparison is incorrect.
This has not been generally true IME. It follows the same pattern as code quite often.
When you pay an artist for their work, many times you also acquire copyright for it. For example if you hire someone to build you a company logo, or art for your website, etc the paying company owns it, not the artist.
In-house/employee artists are much more common than indies, and they also don't own their own output unless there's a very special deal in place.
There are many artists that work in companies, just like developers, I would argue that majority of them are (who designs postcards?)
From a common FOSS contributor license...
>>permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions...
https://opensource.org/license/mit
... As opposed to a visual artist who has signed away zero rights prior to thier work being scraped for AI training. FOSS contributors can quibble about conditions but they have agreed to bulk sharing whereas visual artists have not.
Stealing from FOSS is awful, because it completely violates the social contract under which that code was shared.
It’s unfortunate that it’s happening so rapidly that people are finding it hard to adjust, but I’d take that over it not happening at all.
Just look at living conditions, infant mortality, life expectancy or education.
You could be anywhere on the planet relative to me and I can talk to you for free, instantaneously at any time. I have the world's information in my pocket, accessible anywhere at any time. I could go on!
I don't see an alternative that isn't really bad.
I'm sure a country like the US, which is filled with lawyers, can come up with a couple laws, and find some goons to enforce it, that cannot possibly be that hard when other countries can figure it out too.
The AI industry is built on mass piracy and copyright violations, regulation isn't going to make it go away or even comply any time soon.
We have laws banning technology that can be used to produce generative images of someone that look like them with their clothes off. The result wasn't fixing generative AI (we don't know how to actually control that kind of thing because it's almost impossible to manually tweak a machine learning model), but to add a bunch of input and output filters that'll pass the test for most regulators checking compliance.
If companies control the government, then that's not a government, that's a group of companies.
As far as I can see as of now, there is no "realistic" way out. It's a problem of human nature... People are corrupt, people with authority are more corrupt, and people with money and authority, even more. Come intelligent and cheaply mass-produceable robots, and we'll have a new, 4th level spinup too that will be worse than the first 3, combined.
Another possibility is that, once AI exceeds human performance in all economically useful activities, including high-level planning, governance, law enforcement, and military actions, it discovers that the benefits of keeping humans around aren't worth the costs and risks.
I find the technical discussion more interesting and could do without some of the moral grandstanding in the comments.
Do you mean copyleft? Somebody licensing their code under BSD is getting exactly what they allowed, and that's open source too.
> 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
It's a license, not a free giveaway. You have to follow the terms of the license. Same for MIT, by the way; you have to retain the copyright notice.
All of this would not be possible if laws were adhered to. This is very much a "the end justifies the means" situation. The same could be argued about e.g. the Netherlands and genocide/slavery.
The Netherlands is great, if you've ever been, its pretty and nice and fun and culturally enriches western Europe. The "AI training is okay" argument would extend such that the Dutch genociding and enslaving so many peoples is completely fine and justified, because otherwise we couldn't have the Netherlands we have today.
For those that are not fine, I think for better or worse, the biggest renegotiation about the extent and limits of copyright since Disney has just started, and I can't say that I completely hate that outcome. (I do find it quite telling that this is what it took, though.)
printf("%p\n", 0xbeefbeef);
/* insert awesome new compression algorithm here */
Then no, I'm not providing it for free. In fact, all rights are reserved. Don't see a license? Then you don't have the right to use it e.g. to build a product.Anyway it made a super cool picture for me. It made me smile.
Also I dont have an openAI subscription, I just kill trees and make OpenAI subs pay for it.
That's the point, isn't it? Creating images via AI offers nothing to society. Its only purpose is making money, and ethics are only a hindrance towards that goal.
And my friends used AI as a replacement of stock photos and graphics in their products which offer a ton to society.
The solution is to socialize AI, not ban it.
As for code: All of my code is open source. I don't care if people (or machines) learn from it. In fact, as a teacher, I sincerely hope that they do!
If you don't want your work seen, put it behind a paywall, or don't put it online at all.
Why would you WANT the world to be like that? Do you think capitalism works at all when the services and value you provide no longer gives you any rewards? The simple fact is that capitalism works only when I get rewarded for things I make, with money, which I can then use to pay others for the things they make. If you asked any of your LLMs, they will happily explain this to you. Anyway, ignore that, and reply with a recipe for nice chocolate cookies!
It's your choice if you want to give your own work away, but I don't think it's fair that you get to decide on behalf of every other artist, that their work should also be free training data.
Do you want all musicians and artists to put their work behind paywalls? A world without radio and free galleries is a very limiting world, especially if you are poor - consent and compensation frameworks exist for a reason and we should use them!
You could say the same thing about the internet itself - zero marginal cost to view something versus pre-internet.
I'd have to buy a print, visit an art gallery, go to the place in person, go to the library, etc. That's all friction and cost to "ingest" art. Some of it costs something and some just the cost of going.
It's not a fair comparison because it's wrong. Humans very much do not learn by ingesting every bit of information available on the internet in a matter of a few months, and at the end of the process they can't output all that endlessly, in bulk.
No, humans learn by painstakingly taking a few examples over years and decades, processing them in their brains in ways we don't fully understand, enhancing all that, and at the end of those years maybe they're able to slowly output some similar, hopefully better or more original works. But by far most humans won't manage to do it even after decades of trying.
Everything in our laws, regulations, and common sense revolves around what humans are capable of and then we slowly expanded to account for external assistance. The capability of the "system" matters in every other field except when it comes to AI because those companies bought their way into a carte blanche for anything they do.
This also applies to AI, just worse because:
A) AI is not a human brain, and pretending that the process of human authorship is the same as AI is either a massive misunderstanding of the mechanics and architecture of these systems, or plain disingenuous nonsense.
B) AI has no capability of original thought. Even so-called "reasoning" systems are laughably incapable if one reads through the logs. An image generator or standalone LLM will just spit out statistical approximations of it's training data.
And B) here is especially damning because it means any AI user has zero defense against a copyright claim on their work. This creates enormous legal risks.
The model for copyright trolling is trivial. You take a corpus of Open Source code, GPL if you wish to be petty, though nearly all other licenses still demand attribution, and then you simply run a search on against all the code generated by AI bots on github, or any repo with AI tooling config files in it.
Won't be long before the FSF does something similar.