upvote
You don't need a license to scrape the public web and analyze it, turn it into tokens and other transformations. Let's not expand copyright beyond the horrible monster it already is.
reply
I think it's likely that US law will continue to find training on scraped, unlicensed data to be legal.

That doesn't mean much to the many people I know of who refuse to use a technology that they see as being unethically created using the work of others without compensating them.

I continue to hope that someone will train a "vegan" model on licensed or out-of-copyright data so those people can experience the benefits of this class of technology.

(I compare them to vegans because, like vegans, I think their ethical position is credible and has merit even though I do not choose the same ethical framework for myself.)

reply
This is as ethical as it gets. They're getting compensated by being able to use the result of their work freely. This is the rising tide that lifts all boats.
reply