upvote
You're presenting this as legally clear but it's not. To the detriment of your point.

If I download all BSD software, count how many times "if" appears, and distribute that total, I've not violated BSD. AI generated code is different than that but not totally different.

Ignore nuance and the adults will ignore you.

reply
Fair point, but would you say it would meaningfully change things if all LLMs were to ship with a wall of text of all BSD attributions that were found in the training set?
reply
No, of course not. The issue is that code was copied and used, without adhering to the license, as training data. Even before training started, that's not right. That's the issue.

All of this would not be possible if laws were adhered to. This is very much a "the end justifies the means" situation. The same could be argued about e.g. the Netherlands and genocide/slavery.

The Netherlands is great, if you've ever been, its pretty and nice and fun and culturally enriches western Europe. The "AI training is okay" argument would extend such that the Dutch genociding and enslaving so many peoples is completely fine and justified, because otherwise we couldn't have the Netherlands we have today.

reply
I'm not arguing that it's generally and automatically ok, I'm just saying that it's probably also not right to see it as entirely and inherently immoral, and that some people are probably fine with their contributions to the public domain being used in it.

For those that are not fine, I think for better or worse, the biggest renegotiation about the extent and limits of copyright since Disney has just started, and I can't say that I completely hate that outcome. (I do find it quite telling that this is what it took, though.)

reply