upvote
Probably, yes, but the burden of proof is with us not them.

I'm already glad some companies have the guts to open their models because proving it for open models is probably a lot easier than for a model behind a service.

reply
The proof is the $stupid-billion infrastructure built and kept up to host mousetraps armed with free cheese made of virtue signalling about doing the right thing and sharing the code with the world for free.
reply
That's a matter of changing a law, it's all up to the people and their representatives. We talk as if everything is set on stone but if there really is a will, there is a way.
reply
What's an example of data that might have been stolen?
reply
The media industry loves to quote ridiculous numbers on lost revenue due to piracy etc. May be a rough ballpark numbers will get them to do something about this theft.

Can someone put a rough estimate on potential revenue loss (direct and incidental) from training AI with industry wise breakup.

reply
It’s wrong to stop progress. I just want to know what data went into my model and have access to the same data. The same way we have national libraries of books but with the caveat that I don’t really know how one is supposed to browse petabytes of OpenAI .zips like I browse old books.

If the data is proprietary (eg Meta’s stash of FB comments) then I am satisfied to be told it’s private and I can’t see it. If, however, the works were public then give me a URL if it’s live or a cached copy if it isn’t.

reply