upvote
This comes with the uncomfortable implication that its impossible to tell actually to what extent are LLMs pulling together snippets of GPLd code, and to what extent is that legally acceptable.
reply
> and to what extent is that legally acceptable.

De-jure, not at all.

Parallel creation is a very minimal defense to copyright infringement claims. It is practically impossible to prove in humans, to much annoyance of musicians. "Go prove in a court that you have never heard this song, not even in the background somewhere".

LLMs having been trained on all software they could get their hands on will fail this test. There is no parallel creation claim to be had. AI firms love to trot out the "they learn just like humans" which is both false and irrelevant; It's copyright when humans do it to. If you view a GPL'd repo and later reproduce the code unintentionally? Still copyright infringement.

De-facto though, things are different. The technical details behind LLMs are irrelevant. AI companies lie and frustrate discovery, whilst begging politicians to pass laws legalizing their copyright infringement.

There won't be a copyright reckoning, not anymore. All the dumb politicians think AI is going to bail out their economies.

reply
Wow, I did not expect such perfect reproduction. Link to the actual source code (before being rewritten):

https://github.com/chardet/chardet/blob/5.0.0/chardet/mbchar...

reply