upvote
It's not even an LLM, it's a dataset.

And clearly, if training on copyrighted material is fair use as every LLM makers claim, then this license has literally no weight.

Also, NAL but IIRC an automatically generated dataset isn't copyrightable in the first place.

reply
Honor among thieves?
reply