undefined

[-]

Copyright infringement and plagiarism are not the same or even very closely related. They're different concepts and not interchangeable. Relative to copyright infringement, cases of plagiarism are rarely a matter for courts to decide or care about at all. Plagiarism is primarily an ethical (and not civil or criminal) matter. Rather than be dealt with by the legal system, it is the subject of codes of ethics within e.g. academia, journalism, etc. which have their own extra-judicial standards and methods of enforcement.

by dekhn23 hours ago|

[-]

I suspect they were instead referring to patents; for example, when I worked at Google, they told the engineers not to read patents because then the engineer might invent something infringing, I think it's called willful infringement. No other employer I've worked for has every raised this as an issue, while many lawyers at google would warn against this.

by martin-t10 hours ago|

[-]

You're right, legally speaking.

But you shouldn't be right. I mean, morally.

The law is a compromise between what the people in power want and what they can get away with without people revolting. It has nothing to do with morality, fairness or justice. And we should change that. The promise of democracy was (among other things) that everyone would be equal, everybody would get to vote and laws would be decided by the moral system of the majority. And yet, today, most people will tell you they are unhappy about the rising cost of living and rising inequality...

The law should be based on complete and consistent moral system. And then plagiarism (taking advantage of another person's intellectual work without credit or compensation) would absolutely be a legal matter.

by martin-t1 days ago|

[-]

As opposed to an irregular person?

LLMs are not persons, not even legal ones (which itself is a massive hack causing massive issues such as using corporate finances for political gain).

A human has moral value a text model does not. A human has limitations in both time and memory available, a model of text does not. I don't see why comparisons to humans have any relevance. Just because a human can do something does not mean machines run by corporations should be able to do it en-masse.

The rules of copyright allow humans to do certain things because:

- Learning enriches the human.

- Once a human consumes information, he can't willingly forget it.

- It is impossible to prove how much a human-created intellectual work is based on others.

With LLMs:

- Training (let's not anthropomorphize: lossily-compressing input data by detecting and extracting patterns) enriches only the corporation which owns it.

- It's perfectly possible to create a model based only on content with specific licenses or only public domain.

- It's possible to trace every single output byte to quantifiable influences from every single input byte. It's just not an interesting line of inquiry for the corporations benefiting from the legal gray area.

by afro8818 hours ago|

[-]

Dude come on, I clearly wasn't saying LLMs are people. My point was it's a tool and it's the responsibility of the person wielding it to check outputs.

If it's too hard to check outputs, don't use the tool.

Your arguments about copyright being different for LLMs: at the moment that's still being defined legally. So for now it's an ethical concern rather than a legal one.

For what it's worth I agree that LLMs being trained on copyright material is an abuse of current human oriented copyright laws. There's no way this will just continue to happen. Megacorps aren't going to lie down if there's a piece of the pie on the table, and then there's precedent for everyone else (class action perhaps)

by martin-t5 hours ago|

[-]

Alright, I did make that assumption because I've seen and heard people talk about LLM as people. It worries me that otherwise functional and reasonable people, some of them my friends, have been so easily been convinced by a machine which demonstrated its flaws to me daily.

As for checking outputs - I don't believe that's sufficient. Maybe the letter of the law is flawed but according to the spirit the model itself is derivative work.

A model takes several orders of magnitude more work as training data than it takes to code the training algorithm itself, to any reasonable and sane person, that makes it a derivative work of the training data by nearly 100% - we can only argue how many nines it should be.

> precedent

Yeah but the US system makes me very uneasy about it. The right way to do this is to sit down, talk about the options and their downstream implications, talking about fairness and justice and then deciding what the law should be. If we did that, copyright law would look very different in the first place and this whole thing would have an obvious solution.

[-]

How could you do that though? You can’t guarantee that there aren’t chunks of copied code that infringes.

by Andrex1 days ago|

[-]

Let me introduce you to the concept of submarine patents...

by shevy-java1 days ago|

[-]

But the responsible party is still the human who added the code. Not the tool that helped do so.

by aargh_aargh1 days ago|

[-]

The practical concern of Linux developers regarding responsibility is not being able to ban the author, it's that the author should take ongoing care for his contribution.

by Cytobit1 days ago|

[-]

That's not going to shield the Linux organization.

by cxr1 days ago|

[-]

A DCO bearing a claim of original authorship (or assertion of other permitted use) isn't going to shield them entirely, but it can mitigate liability and damages.

[-]

Can it though? As far as I know this hasn’t been tested.

by 20 hours ago|

[-]

deleted

[-]

In a court case the responsibility party very well could be the Linux foundation because this is a foreseeable consequence of allowing AI contributions. There’s no reasonable way for a human to make such a guarantee while using AI generated code.

by Chance-Device1 days ago|

[-]

It’s not about the mechanism: responsibility is a social construct, it works the way people say that it works. If we all agree that a human can agree to bear the responsibility for AI outputs, and face any consequences resulting from those outputs, then that’s the whole shebang.

[-]

Sure we could change the law. It would be a stupid change to allow individuals, organizations, and companies to completely shield themselves from the consequences of risky behaviors (more than we already do) simply by assigning all liability to a fall guy.

by 1 days ago|

[-]

deleted

by Chance-Device1 days ago|

[-]

What law exactly are you suggesting needs to be changed? How is this any different from what already happens right now, today?

[-]

Right now it's very easy not to infringe on copyrighted code if you write the code yourself. In the vast majority of cases if you infringed it's because you did something wrong that you could have prevented (in the case where you didn't do anything wrong, inducement creation is an affirmative defense against copyright infringement).

That is not the case when using AI generated code. There is no way to use it without the chance of introducing infringing code.

Because of that if you tell a user they can use AI generated code, and they introduce infringing code, that was a foreseeable outcome of your action. In the case where you are the owner of a company, or the head of an organization that benefits from contributors using AI code, your company or organization could be liable.

by galaxyLogic15 hours ago|

[-]

So it's a bit as if Linux Organization told its contributors you can bring in infringing code but you must agree you are liable for any infringement?

But if a lawsuit was later brought who would be sued? The individual author or the organization? In other words can an organization reduce its liability if it tells its employees "You can break the law as long as you agree you are solely responsible for such illegal actions?

It would seem to me that the employer would be liable if they "encourage" this way of working?

by Chance-Device1 days ago|

[-]

It’s a foreseeable outcome that humans might introduce copyrighted code into the kernel.

I think you’re looking for problems that don’t really exist here, you seem committed to an anti AI stance where none is justified.

[-]

A human has to willingly violate the law for that to happen though. There is no way for a human to use AI generated that doesn't have a chance of producing copyrighted code though. That's just expected.

If you don't think this is a problem take a look at the terms of the enterprise agreements from OpenAI and Anthropic. Companies recognize this is an issue and so they were forced to add an indemnification clause, explicitly saying they'll pay for any damages resulting in infringement lawsuits.

by 1 days ago|

[-]

deleted

by johnisgood22 hours ago|

[-]

> Right now it's very easy not to infringe on copyrighted code if you write the code yourself.

Humans routinely produce code similar to or identical to existing copyrighted code without direct copying.

[-]

They don’t produce enough similar code to infringe frequently. And if they did independent creation is an affirmative defense to copyright infringement that likely doesn’t apply to LLMs since they have the demonstrated capability to produce code directly from their training set.

by johnisgood22 hours ago|

[-]

You have shifted from "very easy not to infringe" to "don't infringe frequently", which concedes the original point that humans can and do produce infringing code without intent.

On independent creation: you are conflating the tool with the user. The defense applies to whether the developer had access to the copyrighted work, not whether their tools did. A developer using an LLM did not access the training set directly, they used a synthesis tool. By your logic, any developer who has read GPL code on GitHub should lose independent creation defense because they have "demonstrated capability to produce code directly from" their memory.

LLM memorization/regurgitation is a documented failure mode, not normal operation (nor typical case). Training set contamination happens, but it is rare and considered a bug. Humans also occasionally reproduce code from memory: we do not deny them independent creation defense wholesale because of that capability!

In any case, the legal question is not settled, but the argument that LLM-assisted code categorically cannot qualify for independent creation defense creates a double standard that human-written code does not face.

by sarchertech5 hours ago|

[-]

> You have shifted from "very easy not to infringe" to "don't infringe frequently", which concedes the original point that humans can and do produce infringing code without intent.

Practically speaking humans do not produce code that would be found in court to be infringing without intent.

It is theoretically possible, but it is not something that a reasonable person would foresee as a potential consequence.

That’s the difference.

> LLM memorization/regurgitation is a documented failure mode, not normal operation (nor typical case).

Exactly. It is a documented failure mode that you as a user have no capacity to mitigate or to even be aware is happening.

Double standards are perfectly fine. LLMs are not conscious beings that deserve protection under the law.

>not settled.

What appears to likely be settled is that human authorship is required, so there’s no way that an LLM could qualify for independent creation.

by direwolf2017 hours ago|

[-]

And that's not an infringement. Actual copying is the infringement, not having the same code. The most likely way to have the same code is by copying, but it's not the only way.

by bpt31 days ago|

[-]

In this case, the "fall guy" is the person who actually introduced the code in question into the codebase.

They wouldn't be some patsy that is around just to take blame, but the actual responsible party for the issue.

[-]

Imagine your a factory owner and you need a chemical delivered from across the country, but the chemical is dangerous and if the tanker truck drives faster than 50 miles per hour it has a 0.001% chance per mile of exploding.

You hire an independent contractor and tell him that he can drive 60 miles per hour if he wants to but if it explodes he accepts responsibility.

He does and it explodes killing 10 people. If the family of those 10 people has evidence you created the conditions to cause the explosion in order to benefit your company, you're probably going to lose in civil court.

Linus benefits from the increase velocity of people using AI. He doesn't get to put all the liability on the people contributing.

by raincole17 hours ago|

[-]

Cool analogy! Which has nothing to do with the topic in hand.

by bpt322 hours ago|

[-]

That is a nonsensical analogy on multiple levels, and doesn't even support your own argument.

[-]

Nice rebuttal.

by bpt322 hours ago|

[-]

Why would I put much effort into responding to a post like yours, which makes no sense and just shows that you don't understand what you're talking about?

by sarchertech4 hours ago|

[-]

Why would you put any effort into it at all?

by lo_zamoyski1 days ago|

[-]

Responsibility is an objective fact, not just some arbitrary social convention. What we can agree or disagree about is where it rests, but that's a matter of inference, an inference can be more or less correct. We might assign certain people certain responsibilities before the fact, but that's to charge them with the care of some good, not to blame them for things before they were charged with their care.

by bitwize1 days ago|

[-]

Because contributions to Linux are meticulously attributed to, and remain property of, their authors, those authors bear ultimate responsibility. If Fred Foobar sends patches to the kernel that, as it turns out, contain copyrighted code, then provided upstream maintainers did reasonable due diligence the court will go after Fred Foobar for damages, and quite likely demand that the kernel organization no longer distribute copies of the kernel with Fred's code in it.

[-]

Anyone distributing infringing material can be liable, and it’s unlikely that this technicality will actually would shield anyone.

Anyone who thinks they have a strong infringement case isn’t going to stop at the guy who authored the code, they’re going to go after anyone with deep pockets with a good chance of winning.

by Marha0112 hours ago|

[-]

> Anyone distributing infringing material can be liable

There is still the "mens rea" principle. If you distribute infringing material unknowingly, it would very likely not result in any penalties.

by sarchertech4 hours ago|