Otherwise, LLMs have most of the books memorised anyway: https://arstechnica.com/features/2025/06/study-metas-llama-3...
full transcript: pastebin.com/sMcVkuwd
By replacing the names with something unique, you'll get much more certainty.
So it might be there, by predcondiditioning latent space to the area of harry potter world, you make it so much more probable that the full spell list is regurgitated from online resources that were also read, while asking naive might get it sometimes, and sometimes not.
the books act like a hypnotic trigger, and may not represent a generalized skill. Hence why replacing with random words would help clarify. if you still get the origional spells, regurgitation confirmed, if it finds the spells, it could be doing what we think. An even better test would be to replace all spell references AND jumble chapters around. This way it cant even "know" where to "look" for the spell names from training.
I mean, you can try, but it won't be a definitive answer as to whether that knowledge truly exists or doesn't exist as it is encoded into the NN. It could take a lot of context from the books themselves to get to it.
Shoot, I'd even go so far as to write a script that takes in a bunch of text, reorganizes sentences, and outputs them in a random order with the secrets. Kind of like a "Where's Waldo?", but for text
Just a few casual thoughts.
I'm actually thinking about coming up with some interesting coding exercises that I can run across all models. I know we already have benchmarks, however some of the recent work I've done has really shown huge weak points in every model I've run them on.
If I use Opus 4.6 with Extended Thinking (Web Search disabled, no books attached), it answers with 130 spells.
Basically they managed with some tricks make 99% word for word - tricks were needed to bypass security measures that are there in place for exactly reason to stop people to retrieve training material.
> Borges's "review" describes Menard's efforts to go beyond a mere "translation" of Don Quixote by immersing himself so thoroughly in the work as to be able to actually "re-create" it, line for line, in the original 17th-century Spanish. Thus, Pierre Menard is often used to raise questions and discussion about the nature of authorship, appropriation, and interpretation.
Grok and Deepmind IIRC didn’t require tricks.
I shut it down a while ago because the number of bots overtake traffic. The site had quite a bit of human traffic (enough to bring in a few hundred bucks a month in ad revenue, and a few hundred more in subscription revenue), however, the AI scrapers really started ramping up and the only way I could realistically continue would be to pay a lot more for hosting/infrastructure.
I had put a ton of time into building out content...thousands of hours, only to have scrapers ignore robots, bypass cloudflare (they didn't have any AI products at the time), and overwhelm my measly infrastructure.
Even now, with the domain pointed at NOTHING, it gets almost 100,000 hits a month. There is NO SERVER on the other end. It is a dead link. The stats come from Cloudflare, where the domain name is hosted.
I'm curious if there are any lawyers who'd be willing to take someone like me on contingency for a large copyright lawsuit.
Nothing.
You can be sure that this was already known in the training data of PDFs, books and websites that Anthropic scraped to train Claude on; hence 'documented'. This is why tests like what the OP just did is meaningless.
Such "benchmarks" are performative to VCs and they do not ask why isn't the research and testing itself done independently but is almost always done by their own in-house researchers.
> The smug look on Malfoy’s face flickered.
> “No one asked your opinion, you filthy little Mudblood,” he spat.
> Harry knew at once that Malfoy had said something really bad because there was an instant uproar at his words. Flint had to dive in front of Malfoy to stop Fred and George jumping on him, Alicia shrieked, “How dare you!”, and Ron plunged his hand into his robes, pulled out his wand, yelling, “You’ll pay for that one, Malfoy!” and pointed it furiously under Flint’s arm at Malfoy’s face.
> A loud bang echoed around the stadium and a jet of green light shot out of the wrong end of Ron’s wand, hitting him in the stomach and sending him reeling backward onto the grass.
> “Ron! Ron! Are you all right?” squealed Hermione.
> Ron opened his mouth to speak, but no words came out. Instead he gave an almighty belch and several slugs dribbled out of his mouth onto his lap.
In support of that hypothesis, the Fandom site lists it as “mentioned” in Half-Blood Prince, but it says nothing else and I'm traveling and don't have a copy to check, so not sure.
> Ron nodded but did not speak. Harry was reminded forcibly of the time that Ron had accidentally put a slug-vomiting charm on himself. He looked just as pale and sweaty as he had done then, not to mention as reluctant to open his mouth.
There could be something with regional variants but I'm doubtful as the Fandom site uses LEGO Harry Potter: Years 1-4 as the citation of the spell instead of a book.
Maybe the real LLM is the universe and we're figuring this out for someone on Slacker News a level up!
My standard test for that was "Who ends up with Bilbo's buttons?"
I guess they have to add more questions as these context windows get bigger.
How do you know? Each word is one token?