the article here isn't about the LLM recognizing works that were in the training data. EG, The Old Man and the Sea off the shelf. It's about pegging the author of novel texts, like, say, some letter written by Hemmingway that gets discovered next week and was never before digitized.
But I'm sure the scanning operations will start scouring the earth even harder for any books unaffected by slop containing niche knowledge and text in order for their models to have an edge over the ones trained only on pirate collections and the Internet.
I wonder if secondhand bookshops and deceased estates are seeing bulk buyers of their stock suddenly appearing. Maybe broke governments/municipalities will start selling them entire libraries and archives to ingest.