upvote
Didn't the courts decide that if it's just for learning everything is fair game?
reply
I wonder if we could gamify and democratise it somehow, like fold-at-home and wikipedia...

I've been training a teeny specialised model to run in a browser on a phone to detect harmonium notes played in a song (harmonium turns out is a pita, another story for another day), getting good labelled data is _all_ of the hard work.

That being said, maybe for cheap inference, using a big model to train something ultra-suited for the task at hand might be how we could handle local inference; thinking language specific models.

reply
You don't need to have fully copyright-unencumbered datasets to build Open Source AI, as that (as you say) would be impossible. https://opensource.org/ai
reply