It was shown, in this case, that the llms wouldn’t generate accurate quotes more than 60 words in length.
This is not comparable to encoding a full video file.
Then what if their memory is so good, they repeat entire sections verbatim when asked. Does that violate it? I’d say it’s grey.
But that’s a very specific case - reproducing large chunks of owned work is something that can be quite easily detected and prevented and I’m almost certain the frontier labs are already going this.
So I think it’s just very not clear - the reality is this is a novel situation, the job of the courts is now to basically decide what’s allowed and what’s not. But the rational shouldn’t be ‘this can’t be fair use it’s just compression’. Because it’s clearly something fundamentally different and existing laws just aren’t applicable imo
There's a whole related topic here in the realm of news (since it's shorter form), but it also has a much shorter half-life. Not sure what I think there yet.