upvote
Derivative works can also run afoul of copyright. An LLM trained on a corpus of copyrighted code is creating derivative works no matter how obscure the process is.
reply
This actually isn't what legal precedent currently says. The precedent is currently looking at actual output, not models being tainted. If you think this is morally wrong, look into getting the laws changed (serious).
reply
What about a human trained on having 30 years of experience working with copyrighted codebases?
reply
Said human would likely not be able to create a clean-room implementation of any of the codebases they worked on.
reply
Judge Alsup -- U.S. District Judge William Alsup said Anthropic made "fair use" of books, deeming it "exceedingly transformative."

"Like any reader aspiring to be a writer, Anthropic's LLMs trained upon works not to race ahead and replicate or supplant them — but to turn a hard corner and create something different"

reply
I disagree that information flow is required. Do you have a reference for that? Certainly it is an important consideration. But consider all the real literary works contained in the infinite library of babel.[1] Are they original works just because no copy was used to produce them?

[1]: https://libraryofbabel.info/

reply
Yes; the works are original.

However, describing the path you need to get there requires copyright infringement.

reply
Well discovery might be a fun exercise to see if the code is in the dataset of the llm.
reply
if?
reply