upvote
So it's a bit as if Linux Organization told its contributors you can bring in infringing code but you must agree you are liable for any infringement?

But if a lawsuit was later brought who would be sued? The individual author or the organization? In other words can an organization reduce its liability if it tells its employees "You can break the law as long as you agree you are solely responsible for such illegal actions?

It would seem to me that the employer would be liable if they "encourage" this way of working?

reply
It’s a foreseeable outcome that humans might introduce copyrighted code into the kernel.

I think you’re looking for problems that don’t really exist here, you seem committed to an anti AI stance where none is justified.

reply
A human has to willingly violate the law for that to happen though. There is no way for a human to use AI generated that doesn't have a chance of producing copyrighted code though. That's just expected.

If you don't think this is a problem take a look at the terms of the enterprise agreements from OpenAI and Anthropic. Companies recognize this is an issue and so they were forced to add an indemnification clause, explicitly saying they'll pay for any damages resulting in infringement lawsuits.

reply
deleted
reply
> Right now it's very easy not to infringe on copyrighted code if you write the code yourself.

Humans routinely produce code similar to or identical to existing copyrighted code without direct copying.

reply
They don’t produce enough similar code to infringe frequently. And if they did independent creation is an affirmative defense to copyright infringement that likely doesn’t apply to LLMs since they have the demonstrated capability to produce code directly from their training set.
reply
You have shifted from "very easy not to infringe" to "don't infringe frequently", which concedes the original point that humans can and do produce infringing code without intent.

On independent creation: you are conflating the tool with the user. The defense applies to whether the developer had access to the copyrighted work, not whether their tools did. A developer using an LLM did not access the training set directly, they used a synthesis tool. By your logic, any developer who has read GPL code on GitHub should lose independent creation defense because they have "demonstrated capability to produce code directly from" their memory.

LLM memorization/regurgitation is a documented failure mode, not normal operation (nor typical case). Training set contamination happens, but it is rare and considered a bug. Humans also occasionally reproduce code from memory: we do not deny them independent creation defense wholesale because of that capability!

In any case, the legal question is not settled, but the argument that LLM-assisted code categorically cannot qualify for independent creation defense creates a double standard that human-written code does not face.

reply
> You have shifted from "very easy not to infringe" to "don't infringe frequently", which concedes the original point that humans can and do produce infringing code without intent.

Practically speaking humans do not produce code that would be found in court to be infringing without intent.

It is theoretically possible, but it is not something that a reasonable person would foresee as a potential consequence.

That’s the difference.

> LLM memorization/regurgitation is a documented failure mode, not normal operation (nor typical case).

Exactly. It is a documented failure mode that you as a user have no capacity to mitigate or to even be aware is happening.

Double standards are perfectly fine. LLMs are not conscious beings that deserve protection under the law.

>not settled.

What appears to likely be settled is that human authorship is required, so there’s no way that an LLM could qualify for independent creation.

reply
And that's not an infringement. Actual copying is the infringement, not having the same code. The most likely way to have the same code is by copying, but it's not the only way.
reply