upvote
It doesn't seem like this is proving much of anything? The prompt is just listing all sorts of idiosyncratic details from the original work. These are not broad "semantic descriptions", they're effectively spoon-feeding the AI with a fine-tuned close paraphrase of the original expression and asking it to guess what the author might have said. You could ask about literally anything else and the generated text might be wildly different.

This is just the equivalent of saying that monkeys could write Shakespeare by banging on a typewriter, there's hardly any copyright implications here.

reply
They use GPT-4o to generate plot summaries from verbatim quotes. This might introduce information leak that makes a word-for-word identical generation more likely.

The authors don't test this possibility.

BTW, is Jane C. Ginsburg (one of the authors) https://en.wikipedia.org/wiki/Jane_C._Ginsburg ?

reply
IMHO giving many details in the prompt and asking the model to "fill in the blanks" feels a little like cheating in the same way as embedding the dictionary in the decompression program. But it will certainly make the Imaginary Property lawyers squirm.
reply
It's not cheating, it seems like a technique to defeat obfuscation to show the content is there in a complete or near-complete form, which proves it was copied.
reply