upvote
To clarify, because a number of posts here sort of suggest the confusion:

the article here isn't about the LLM recognizing works that were in the training data. EG, The Old Man and the Sea off the shelf. It's about pegging the author of novel texts, like, say, some letter written by Hemmingway that gets discovered next week and was never before digitized.

reply
It is for now.

But I'm sure the scanning operations will start scouring the earth even harder for any books unaffected by slop containing niche knowledge and text in order for their models to have an edge over the ones trained only on pirate collections and the Internet.

I wonder if secondhand bookshops and deceased estates are seeing bulk buyers of their stock suddenly appearing. Maybe broke governments/municipalities will start selling them entire libraries and archives to ingest.

reply