If Linux were to contain 3rd party copyrighted code the legal entity at risk of being sued would be... Linux users, which given how widely deployed Linux is is basically everyone on Earth, and all large companies.
Linux development is funded by large companies with big legal departments. It's safe to say that nobody is going to be picking this legal fight any time soon.
However, there is no legal precedent that says that because contributors sign a DCO and retain copyright, the Linux Foundation is not liable. The entire concept is unproven.
Large company legal departments aren’t a shield against this kind of thing. Patent trolls routinely go after huge companies and smaller companies routinely sue much larger ones over copyright infringement.
Linus makes $1.5 million per year from the Linux foundation. And the foundation itself pulls in $300 million a year in revenue.
They are directly benefiting from contributors and if they cause harm through their actions there’s a good chance they’ll be held liable.
2. Infringement in closed source code isn’t as likely to be discovered
3. OpenAI and Anthropic enterprise agreements agree to indemnify (pay for damages essentially) companies for copyright issues.
There has to be an analogy to music or something here - except that code is even less copyrightable than melodies.
Yes, there might be some specific algorithms that are patented, but the average programmer won't be implementing any of those from scratch, they'll use libraries anyway.
Code being copyrightable is the entire basis for open source licenses.
What part of a bog-standard HTTP API can be copyrighted? Parsing the POST request or processing it or shoving it to storage? I'm genuinely confused here and not just being an ass.
There are unique algorithms for things like media compression etc, I understand copyrighting those.
But for the vast majority of software, is there any realistic threat of hitting any copyrighted code that's so unique it has been copyrighted and can be determined as such? There are only so many ways you can do a specific common thing.
I kinda think of it like music, without ever hearing a specific song you might hit the same chord progressions by accident because in reality there are only so many combinations you can make with notes that sound good.
I've worked at a company that was asked as part of a merger to scan for code copied from open source. That ended up being a major issue for the merger. People had copied various C headers around in odd places, and indeed stolen an odd bit of telnet code. We had to go clean it up.
It’s no worse than non-AI assisted code.
I could easily copy-paste proprietary code, sign my name that it’s not and that it complies with the GPL and submit it.
At the end of the day, it just comes down to a lying human.
But a human just using an LLM to generate code will do it accidentally. The difference is that regurgitation of training text is a documented failure mode of LLMs.
And there’s no way for the human using it to be aware it’s happening.
If you can’t be sure, don’t sign.