Dowling v. United States, 473 U.S. 207 (1985): The Supreme Court ruled that the unauthorized sale of phonorecords of copyrighted musical compositions does not constitute "stolen, converted or taken by fraud" goods under the National Stolen Property Act
It's perfectly reasonable to say it's okay for humans to do something but not okay for a computer program to do the same thing. We don't have to equate AI to humans, that's a choice and usually a bad one.
It would not be reasonable to allow machines to do that at unlimited scale without restrictions.
(Hopefully the fossil fuels industry won't draw inspiration from the legal arguments made by AI companies...)
Is there any line past which it becomes unreasonable?
> It would not be reasonable to allow machines to do that at unlimited scale without restrictions.
If the machines were a replacement for a damaged respiratory system in a human would it reasonable?
What about if the machine were being used by a human to do something else that was important?
Where is the line where it becomes reasonable?
That's exactly the question we should be asking about AI and fair use.
Now, if you'll excuse me, I need to catch a metal shuttle that chucks itself through the air on wings.
The relevant extension of your analogy is should birds be required to obey FAA rules? Or should plane factories be protected as nesting sites?
The mental calisthenics required to justify this stuff must be exhausting.
It's only exhausting if you think copyright ever reasonably settled the matter of ownership of knowledge and want to morally justify an incoherent set of outcomes that they personally favor. In practice it's primarily been a tool for the powerful party in any dispute to hammer others for disrupting their business model. I think that's pretty much the only way attempting to apply ownership semantics to knowledge or information can end up.
Knowledge consists of, roughly speaking, thoughts.
(a "justified true belief" - per https://plato.stanford.edu/entries/knowledge-analysis/ - is a kind of thought)
The "thinking" part of a "thinking being" - that also consists of thoughts.
If your knowledges are someone's property, you are someone's property.
A society where all knowledge is proprietary, is a society of ubiquitous slavery.
Maybe multi-layered, maybe fractional, maybe with a smiley-face drawn on top.
Doesn't matter.
I mean I don't think think I could find a better description for following the derivatives of error in reproducing a set of works as creating a "derivative work".
I agree. However, the reverse is also likely true, i.e., it cannot currently be denied that learning in humans is different from learning in artificial neural networks from the point of view of production of works that mix ideas/memes from several works processed/read. Surely, as the article says, copyright law talks exclusively about humans, not machines, not animals.
Edit*: Or perhaps put more pseudo legally that the created works infringe on the copyrights of the original human creators.
The above does not follow from, imply or conclude anything about learning in artificial neural networks and humans being similar or dissimilar.
Copy/pasting at scale, yes
Code gets turned into tokens and then it learns the next most likely token.
The issue that I see most people talk about it the scale at which is learnt.
A human will learn from other people’s code but not from every persons code.
Copyright law is very clear that if a machine does it, the original copyright on the input is kept. This is why your distributed binaries are still copyrighted, because the machine transformed, very significantly, the source code into binary which maintains the copyright throughout.
It would be inconsistent for the courts to suddenly decide that "actually, this specific type of machine transformation is actually innovative."
I know this is generally really bad for the AI industry, so they just ignore it until a court tells them they can't anymore. And they might get away with it as I don't have faith that the courts will be consistent.
And the specifics of autoregressive pretraining is that it is lossy compression. Good luck finding which copyrighted materials have made it into the final weights.
Yup, it absolutely does. In fact, that's why you are still violating copyright law by using bittorrent even though each of the users is only giving out a small slice or shred of the original content.
The US has a granted defense in the case of something like shredding called "Fair Use" but that doesn't mean or imply that a copyright is void simply because of a fair use claim.
> And the specifics of autoregressive pretraining is that it is lossy compression.
That doesn't matter. Why would it? If I take a FLAC recording and change it to an MP3. The fact that it was a lossy transform doesn't suddenly give me the legal right to distribute the MP3.
> Good luck finding which copyrighted materials have made it into the final weights.
That's what the NYT v. OpenAI lawsuit is all about. And for earlier models they could, in fact, pull out full NYT articles which proved they made it into the final weights.
Further, the NYT is currently in discovery which means OpenAI must open up to the NYT what goes into their weights. A move that, if OpenAI loses, other litigants can also use because there's a real good shot that OpenAI also included their works in the dataset.
Well, it's not the first time when the law contradicts laws of nature (for the entertainment of the future generations). Bittorent is not a relevant example, because the system is designed to restore the work in its fullness.
> in fact, pull out full NYT articles
That's when they used their knowledge of the exact text they wanted to "retrieve" to get the text? It wouldn't be so efficient with a random number generator, but it's doable.
You can restore shredded documents with enough time and effort. And if you did that and started making photo copies, even if they are incomplete, you will run afoul of copyright law.
Bittorrent is a relevant example because it shows that shredding doesn't destroy copyright.
Remember, copyright is about the right to copy something. Simply shredding or destroying a thing isn't applicable to copyright. Nor is giving that thing away. What's applicable is when you start to actually copy the thing.
EDIT: I don't say that neural networks can't rote learn extensive passages (it's an effect of data duplication). I'm saying that they are not designed to do that and it's possible to prevent that (as demonstrated by the latest models).
The way I arrive at that is imagine you add just 1 pixel of static to a video, that'd still be a copyright violation. Now imagine you slowly keep adding those random pixels. Eventually you get to the point where the whole video is just static, but at some point it wasn't.
Now, would any media company or court sue over that? Probably not. But I believe that still falls under copy right (but maybe fair use?).
The issue with neural networks is they aren't people. Even when you point your LLM at a website and say "summarize this" the output of that summation would be owned by the website itself by nature of it being a machine transformed work.
Remembered, it's not just mere rote recitation which violates the law, any transformation counts as well. The fact that AI companies are preventing it doesn't really solve the problem that they are in fact transforming multiple copyrighted works into their responses.
What would violate copyright is if you took that rendered page, turned it into a jpeg, and then hosted that jpeg from your own servers. That's the copying that would run afowl of copyright law.
I have seen LLMs do all sorts of crap which was clearly reproduction of training material.
This is also why people are most impressed with how much better it is at reproducing boilerplate rather than, say, imaginative new ideas.
If the LLM generates output that a court decides is sufficiently derivative, and especially (but not necessarily) if the LLM was trained on the source material being infringed, then whoever redistributes the derivative output is going to be liable for copyright infringement.
Creation of the LLM itself is transformative, but LLM output which infringes is not.
The case Community for Creative Non Violence Vs Reid (https://en.wikipedia.org/wiki/Community_for_Creative_Non-Vio...) solidifies a supreme court opinion that someone contracting a work and directing an author does not grant authorship to the commissioner of the work, it grants authorship to the person actually doing the work.
The author can grant authorship and copyright to the commissioner with a contract, but the monkey picture (and others) have solidified that only humans can be granted copyright. Since LLMs aren't human they can't hold copyright, and if the LLM doesn't have legal copyright then they don't have legal rights to assign copyright to you.
Code is protected by copyright as a literary work. The method is not protected by copyright, that would be the domain of patents. What's protected are the words.
If you say "Claude, build me a website about X" then you do not have any creative control over the literary work Claude is producing. You just told a machine to write it for you. Nor, like a compiler, is it derivative of any other work that you wrote.
If, on the other hand, you are working jointly with Claude to make specific changes to the code on a line-by-line basis, then you will have no problem claiming copyright over the code. Claude in this case is acting as a tool, but there's still a human making decisions about the code.
In the case where you wrote a bunch of markdown and then told Claude to generate the corresponding code but didn't have any involvement in writing the code itself, you could perhaps claim that the code is a derivative work of the markdown, a court would have to handle that case-by-case basis and evaluate how much control you exerted over the work.
No, a copyright application can be filed with a corporation listed as the author. Watch for the copyright notice at the end of the next major movie you see.
In any case, the corporation did not create the product, people created it and their contractual relationship with the corporation defined how the ownership of that work was managed. So, I don't find it too unusual that this element of personhood is available to corporations.
Under at least EU AI Act, any work done by AI is not granted copyright. But it does not mean copyright does not apply, it means the amount of work credited to AI is set at 0% (simplification). A human working off another's work unless it's perfect copy will have "credit" for changes that are judged creative/transformative, meaning a human plagiarizing something still can claim to have some degree of authorship. An AI won't.
In a sense, the copyright status of final work is a sort of "sum with dilution" were each work involved adds to claims, but AI's output is set at 0 - the prompt or further rework by human is not.
As for employer, details vary but generally "work for hire" rules and contracts do reassignment of material rights (in EU and some other places you can not reassign moral rights which are a different thing).
I think what this means is that the employee may not be the copyright owner for multiple reasons, which are possibly applicable simultaneously. It does not imply that the employer owns copyright over the work that is in public domain, which would be a contradiction.
I honestly don't understand why the attitude that underlies this is so prevalent.
When I write code, what I write and how I write it is informed by having read countless source code files over my education and my career. Just as I ingest all that experience to fine-tune how my later code is written, so does the LLM from the code it's seen.
The immediate retort to that is that the LLM is looking at code that wasn't its to read. But I don't think that's a valid objection. Pretty much by definition, everything I've learned from has a copyright on it, and other than my own code on my own time, that copyright is owned by someone else. Much of the code that's built up my understanding has been protected by NDA, or even defense-department classifications: it wasn't mine in any way. But it still informs how I do all my future coding.
By analogy: I'm also an artist, especially since my retirement. My approach to photography was influenced by Ansel Adams, and countless other artists whose works I've seen displayed in museums, or in publications and online. My current approach to painting was inspired by Bob Ross and others, and the teachers who have helped me develop. I've taken pieces of what I've seen in all their work, and all of that comes out in my photos and paintings, to varying degrees.
I've taken ideas from others in code and in art, and produced something (hopefully!) different by combining those bits with my own perspective. I don't think anyone has a claim on my product because of this relationship.
Likewise, I know that many of my successors have learned from my code (heck, I led teams, wrote one book about software development!). And I hope that someday my artwork has developed to the point where there's something in it that's worth someone else's attention to assimilate. I've never for a minute - even decades before the advent of LLMs - hoped or even imagined that my work would remain locked up with me, and that the ideas would follow me to the grave.
As they say, we are all standing on the shoulders of giants. None of us would be able to achieve the tiniest fraction of what we have, without assimilating what has come before us. Through many layers of inheritance it's constantly being incorporated in subsequent works.
In a few decades at best, I'll be dead. It probably won't be very long after that when people even forget my name. But the idea that something I've done - my work in developing software systems, or in my photography and painting - will continue to have ripples through time, inspires me and gives me hope that I'll have some tiny shred of immortality beyond my personal demise.
I live in the UK, and most US law is based upon English common law, it's not some immutable code given to us from above. It's based upon assumptions and capabilities of the entities participating in the system at the time the law was codified. It can and should change to make more sense if those assumptions and capabilities shift massively.
If they have only the rights that their human creators have, then access to them cannot be sold, in the exact same way that I cannot sell you a database that I have collected filled with copyrighted material. The "humans do training too" argument only holds if you imbue LLMs with similar rights to humans.
I am allowed to sell myself (in a very limited capacity) to others for them to exploit my training, even if that training was on protected material, which is a privilege humans should have, but machines should not.
However, because it is an issue with (at least historical) goals of copyright law, the common pattern that is evolving is that AI is not granted copyright of any work it generates, making it a bit of poison pill for some of the egregious ideas of corporate abuse. Not sure if the weights will be considered copyrightable either.
The nature of the source material matters though. Training a model on open source software seems perfectly fair - it has explicitly been released to the public, and learning from the code has never been a contested use.
IMO the questions around coding models should be seen as less about LLMs and more as a subset of the conversation about large companies driving immense profits from the work of volunteers on open-source projects, i.e. it's more about open source than AI.
I can't imagine it really justifiable to say that training off data is the same as "stealing", when that same claim, that learned information that a person could retain and reproduce constitutes copyright infringement is the subject of many dystopian narratives, like this one, where once your brain is uploaded to the cloud you have to pay royalties based on every media product you remember.
When it picks out a rare bit of code, it will be simply copying that code, illegally, and presenting it without attribution or any licenses which is in fact breaking the law but AI companies are too important for the law to apply to them.
There's been instances where models have spat out comments in code that mention original authors, etc., effectively outing itself as a copyright thief.
There's nothing anyone can do about it, but the suspicion is that the big companies have taken everyone's code on GitHub, without consent, and trained on it.
And now are spitting out big chunks of copyrighted code and presented it as somehow transformed even though all they've actually done is change a few variable names.
It is copyright theft, but because programmers are little people, not Disney, we don't have any recourse.
It's pretty likely that I've done the same thing. I mean, I've written enough CRUD functions in my life, for example, that in all likelihood I'm regurgitating stuff that's a copy, for all practical purposes, of stuff I've done before as work-for-hire for my employer. I'm not stealing intentionally or consciously, but it seems quite likely that it's happening. And that's probably true for many of you, at least that have been in the industry for a while.
I asked agent X what is the source of training data it generated code from, it couldn’t say. Then I asked why the code implementation is exactly the same as the output of agent Y. It said they were trained on the same ‘high-quality library’, and still couldn’t say which one.
So I guess that’s fine because everyone is doing it.
https://www.npr.org/2025/09/05/g-s1-87367/anthropic-authors-...
When I write fizzbuzz do I owe royalties to the inventor of fizzbuzz? Is my brain copyright thieving because I can write out the song lyrics from memory?
Few people ever actually read open source code, but I'd like to think on the rare occasions they do, they share a connection with the author. I know when I read somebody else's code, for me to understand it I have to be thinking about the problem the same way they were when they wrote it. I feel empathy with them and can sometimes picture the struggle, backtracking, and eureka moments they went through to come up with their solution.
Somehow I don't get the same warm fuzzy feelings about a machine powered by investor money ingesting my work automatically, in milliseconds, and coldly compressing it down to a few nudges on a few weights out of trillions of parameters. All so the machine can produce outputs on-demand for lazy users who will never know of me or appreciate my little contribution, and ultimately for the financial benefit of some billionaires who see me as an obsolete waste of space.
I guess I'm just irrational that way.
And so does well-crafted bespoke software.
The engineers who built the foundation for the industrial expansion of our forefathers went through the same exact thing we're going through now. They look at what existed, and use it to inform their efforts. This is what LLMs do.
I'm not attempting to moralize here, just comment on the parallels. Do I agree that a craftman's work is consumed by the juggernauts and no second thought is given? No. I think its a shame. But I also think the output will never match the artisans that practice now. By the very nature of the machines we employ, we cannot match the skill or thought that goes into bespoke code.
If I spend 2 hours designing the domain model, 1 hour slopping out a rough implementation, and 5 hours polishing it with a combo of handwritten and vibed refactorings, I will get a better result than if I spent 8 hours writing everything by hand.
So my point is not that vibe software is lower quality, as my experience has shown the opposite. It is simply that the spirit of sharing my work was done with the idea that I was sharing it with others who toiled in the same craft, not sharing for consumption by machine. Not that I ever contributed anything very important to the open source world, that anybody depended on. Just personal projects I thought were neat or educational.
In hindsight I would probably still have open sourced what I did, because I think it's valuable to have on record that I competently programmed stuff before AI even existed, like pre-atomic steel. But I don't know if I will open source any personal code going forward.
====
To put it more succinctly: if somebody "ripped off" my open source code in 2018, I wasn't mad about that. Even if they didn't bother to attribute me, well, at least they saw my stuff, had a human brain cell light up appreciating it, and thought it was worth stealing. I'm flattered. But with LLMs my work can be reappropriated without a single human ever directly knowing or caring about it.
It turns out that's false. We know that genes are patentable; remember back during the Human Genome Project, when there was such a rush to patent them? So genes are IP. (This seems bizarre to me, since they're patenting something that was found just sitting there, but this is what the system says right now.)
Well, two other humans (aka mom and dad) did create me, based on those patentable genes (and most likely including some genes that were, in fact, patented).
I'm not sure what to conclude from all of that, but I do think that it invalidates your argument.
You are presumably human. We have granted humans specific exemptions in copyright law. We have not granted that to LLMs. Why are we so eager to?
We gave certain temporary monopoly on certain uses to humans under rules little understood by laymen even if their livelihood depends on it.
Are you telling me that I can use the thing, but I can't use it if I process it through an LLM? It get slippery, fast.
If I write a story, I can put it online. That doesn't mean it's ok to take that story and publish it in an anthology.
There's also a TON of irony here. What an about face it is, for the community at large* to switch from "information wants to be free, we support copyleft and FOSS" to leaning so heavily on an incredibly conservative reading of IP law.
It doesn't need to. Laws are for humans.
Laws don't give rights to chainsaws. Or lawnmowers. Or kitchen knives, hammers, screwdrivers, and spades.
You can't use any of those to commit a crime and then claim that the law specifically did not exclude those tools.
Why are you seemingly in favour of carving out an exemption for LLMs?
Laws are for humans.
Arguing that the law did not specifically address "intentionally killing a person by tickling them till they died" means that you found a loophole which can be used to kill people is...
well, it's in the "not even wrong" category...
If we take the point of view that LLMs are tools (I agree), then people need to be absolutely certain that these tools don't contain (compressed) representations of copyrighted works.
People seem not to want to do that. And they argue that the LLMs have "learned" or "been inspired" by the copyrighted works, which is OK for humans.
This is the problem. People can't even agree on which of two mutually exclusive defenses to appeal to! Are LLMs tools which we have to ensure aren't used to reproduce copyrighted work without permission, or are they entities that can be granted exemptions like humans can? It can't be both!
> There's also a TON of irony here. What an about face it is, for the community at large* to switch from "information wants to be free, we support copyleft and FOSS" to leaning so heavily on an incredibly conservative reading of IP law.
True. While IP-owning companies like Microsoft now say "it's online, so we can use it".
It's bizarre.
I'll tell you what: I'll drop my conservative stance in defense if FOSS when Windows and the latest Hollywood movie are "fair use" for consumption by whatever LLM I cook up.
Since this is a new language, and not documented on the web nor on Github, Claude's ability is not based off of stolen IP. At best it's trained on other language concepts, just like we can train ourselves on code on GitHub.
Maybe a good reason to create a new programming language?
Note: IANAL. The above is just from my current understanding.
I don't think there's even a valid argument for any other ownership model, or at least none that I can think of.
The primary issue being that it's all built on stolen data in the first place.
In order to have a sane conversation about this we have to all agree not to lie.
Compilation and translation happen in a generic manner and does not rely on a mountain of other IP, it is really just a transformative tool that happens to do something useful, someone constructed it to be a very precise translation to the point that any mistakes in it are called bugs and we fix them to ensure the process stays deterministic. Translators try hard to 'get it right' too: to affect the intentions of the original author as little as possible.
When you use a model loaded up with noise or that you have trained exclusively on code that you actually wrote I think a strong case could be made that you own the copyright on that work product. But when you train that model on other people's work, especially without their consent or use a model that has been trained in that way you lose your right to call the output of that model yours.
You did not write it, and the transformative process requires terabytes of other people's IP and only a little bit by you.
As soon as you can prove that your contribution substantially outweighs the amount of IP contributed in total you would have a much stronger case.
I think I may have misunderstood your original comment above. It seems intending to say:
No, that human owns the copyright on the prompt, not necessarily on the work product. The human may partially have copyright over the work product as well, "how much" being dependent on how much new creative expression from the human was involved vs that from others.
Both the compiler (in absence of inclusion of copyrighted libraries) and the LLM are considered to not add creative work and thus do not change copyright status of the works they transform.
You can consider the training set of the LLM or other AI model to be 3rd party libraries and the level of copyright from them applying to final output to be how much can be directly considered derivative, just as reading copyrighted code and being inspired by it does not pass that copyright to your work unless it's obviously derivative
I like this comparison -- training set as '3rd party libraries'. Except, of course, that the authors behind the training set may not have actually granted permission to use, whereas the 3rd party libraries usually have some permission by way of license.
Adding two subtle points:
>> Indeed a developer owns copyright over the source code and on the compiled binaries, because there is no expansion happening here but just a translation from one format into another ... does not rely on a mountain of other IP
... and, the license agreement of the compiler and libraries used / linked to practically always explicitly waive copyrights over the said non-mountain of IP.
>> As soon as you can prove that your contribution substantially outweighs the amount of IP contributed in total you would have a much stronger case.
... a much stronger case that you have a partial copyright over the work, which is now likely a derivative work. You still may not have a case that you own the copyright exclusively (or as the original article says, that your employer does).
If the compiled binaries (output) were produced by running the input (source code) over every program written, then sure.
But that's not what's happening with compilers, is it? The output of a prompt is dependent on copyrighted work of others every single time it is run.
The output of a compiler is not dependent on the copyright output of every other program.
However:
1. The "every"ies in your comment are not to be taken literally either. :-)
>> If the compiled binaries (output) were produced by running the input (source code) over every program written, then sure.
2. More importantly, the above seems cyclically dependent on whether output from generative AI is deemed to be in public domain or not, which I consider is an open-ended issue as of now. It is not so 'sure' as yet. :-)
The humans at the bottom who were crushed should blame the boulder, which happened to be moving.
If you only get copyright for the prompt you make, but not the output, then it's like being responsible only for the prompt, but not the output.
Ie he's only responsible for pushing the boulder up the hill. The fact that it rolled down from the hill and crushed someone's house "isn't his fault" (he doesn't get copyright on it).
>The Office concludes that, given current generally available technology, prompts alone do not provide sufficient human control to make users of an AI system the authors of the output. Prompts essentially function as instructions that convey unprotectible ideas. While highly detailed prompts could contain the user’s desired expressive elements, at present they do not control how the AI system processes them in generating the output.
If you're not the author then why would you have to be liable for it?
If you do not understand this make sure that you always operate within a framework of people who do because this soft of misunderstanding can cause you a world of grief.
Because you are the person shipping it, and as such regular liability applies. If I'm not the author of a book, and make a lot of copies and distribute those I'm liable for the content of that book, regardless of whether or not I hold the copyright to it. Conversely, if the original author sues because they feel their work infringes then that too is a liability that stems from the distribution.
And 'distribution' is a pretty wide term, not unlike 'interstate commerce', lots of things that you might not consider to be distribution can be classified as such in court.
Different laws do not come in packages, they apply individually, and sometimes they apply collectively but it isn't a menu where you can pick the combination that you think makes the most sense.
Technically when you select "copy image" instead of "copy image url" and paste that to a friend you're often committing copyright infringement. Do I think this is reasonable? Absolutely not. The same goes for this - the author should hold liability, so make the person who ends up causing the work to exist the damn author.
But nooo, we can't have that. Instead we need to have these convoluted exceptions that don't at all work how the real world works, so that lawyers can have even more work.
Besides, if we go by "the law" then we already have a court case where training an AI model is protected by fair use. But obviously that isn't satisfying enough for people, so they keep talking about how it's stealing (refer to my first sentence).
Also, this situation is going to get funny when some country decides that AI generated content does get copyright protection.
You are completely misunderstanding GP's distinction between ownership and liability.
In short, if you use someone else's car to kill someone, you are still liable for killing that person even though you don't own the car.
Do you disagree with that statement?
> Besides, if we go by "the law" then we already have a court case where training an AI model is protected by fair use.
Yes, but training an AI is a completely different thing than distributing the work product generated by that AI.
Note that I don't agree with all aspects of copyright law either, but I'll be happy to play by the rules as set today simply because I can't afford to be wrong and held liable for infringement. For instance I strongly believe that the length of copyright is a problem (and don't get me started on patents, especially on software). I also believe that only the original author should have copyright, not the company they worked for, their heirs (see Ravel for a really nasty case) or anybody else. I believe they should not be transferable at all.
But because I'm a nobody and not wealthy enough to challenge the likes of Disney in court I play by the rules.
As for 'this situation is going to get funny when some country decides that AI generated content does get copyright protection':
Copyright is one of the most harmonized legislative constructs in the world. Almost every country has adopted it, often without meaningful change. In practice US courts are obviously a very important driver behind changes in copyright law. But in general these changes tend to lean towards more protection for copyright owners, not less. So far the Trump admin has not touched copyright law in their usual heavy handed manner. I'm not sure if this is by design or by accident but maybe there are lines that even they can not easily cross without massive consequences.
Some parties in the AI/Copyright debate are talking about two sides of their mouth, for instance, Microsoft is heavily relying on being able to infringe on copyright at will but at the same time they are jealously guarding their own code. Such hypocrisy is going to be the main wedge that those in favor of strong copyright are going to use to reduce the chances that AI work product deserves copyright, after all, if it is original and not transformative then Microsoft could (and should!) train their AI on their own confidential code. But they're not doing that, maybe they know something you and I do not...
Same point goes to if an animal takes a picture.
See:
https://technophilosoph.com/en/2025/02/07/ai-prompts-and-out...
If you have a more recent citation referring to case law that states the opposite then that would be great but afaik this article reflects the current state of affairs.
The human using the tool creates a prompt, there is then an automatic transformation of the prompt into code. Such automatic transformation is generally accepted as not to create a new work (after all, anybody else inputting the same prompt would have a reasonable expectation of generating the same output modulo some noise due to versioning and possibly other local context).
Claud code and in general AI generated code does not at present create a new work. But the prompt, that part which you input may be sufficiently creative to warrant copyright protection.
Every developer I’ve seen use these tools has have engaged in a meaningful contribution: specific directions across multiple prompts, often (though not always) editing the code afterwards, manually running the code and promoting for changes, etc.
Until the courts, legislators, or the copyright office define something otherwise, I’m highly confident of my assertion. (Mostly because of the insane number of hours I’ve spent with counsel on this. And, as a disclaimer, since I am biased: I worked on Copilot and Google’s various AI assisted coding products as an SVP and VP.)
The fact that meaningful contribution has not been defined is a strong signal that things are not nearly as clear cut as you make them out to be. Until there is a ruling that clearly establishes that the person that generated the prompt owns the copyright on the code I think it is misleading to suggest that this is already the case, your lawyers are not the lawyers of the parties that will end up hurt if it ends up not being so.
For contrast: we have a very clear idea on what things are copyrighted and in general these things do not rest on a foundation of IP appropriated from others outside of the license terms. The fact that the infringement is fine grained and effectively harms the rights of 1000s or more individuals doesn't change the heart of the matter, whoever wrote the code: it wasn't you.
Given your bias I'm not surprised that this would be your argument though, effectively you have created a copyright laundromat using code that you were nominally the steward of and not the owner but whether it stands long term or not is not up to your lawyers.
You warrant you wrote the code yourself, then it is found your code infringes on code owned by other entities. Now you have a tough choice: admit you lied about writing your code yourself tainting all of the code you claim you wrote since these tools became available or stand and take the infringement penalty which could be very substantial.
Judges and courts don't like playing silly games like this.
I've sued two parties for copyright infringement and won and a third settled out of court for a substantial sum. You don't tell a judge you don't need to prove you wrote the code, that's an automatic loss. Then there are such things as expert witnesses who will interview you and check how much you know about the code you claim you wrote.
This doesn't really make sense; in no way can an "expert" interview definitively assert someone wrote a piece of code or not, especially if the person has access to the code beforehand.
I believe the standard can be as low as "more likely than not".
Also, when it comes to code, the case is even more damning because the vast majority of the code which LLMs are trained on was not only copyright but subject to an MIT license (at best) and even the MIT license, which is the most permissive license in existence, still says clearly:
"Permission is hereby granted, free of charge, to any person obtaining a copy of this software"
The word 'person' is used very intentionally here.
I think there should be several kinds of AI taxes which should be distributed to all copyright holders. There should be a tax to go to writers (and book authors), a tax to go to open source developers and a tax for the general population to distribute as UBI to account for small-form content like comments and photography...
People invested a lot of time building their entire careers around the assumption of copyright protection; so for it to be violated on such a scale would be a massive betrayal.
Copyrights already preclude short phrases for the same reason -- there are only so many ways in which short phrases could be produced. The moment a work becomes larger (large enough; AFAIK, the threshold is not precisely defined), the reasoning you applied fails to apply.
The Google-Oracle lawsuit did not decide whether APIs (when large in number) are copyrightable or not.
That's like saying "there's only so many ways to greet your neighbor, so any text that simply greets your neighbor isn't copyrightable – and therefore no text is copyrightable".
I can totally see this applying here as well.
Now this doesn't resolve the issue of AIs being trained on copyrighted works it had no rights to. The counterargument is that this is a derivative or transformative work but I don't believe that's settled law at all.
[1]: https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput...