upvote
I think this is a good way to test a certain kind of capability, but as to whether LLMs would pass such a test, I'm guessing almost certainly not. If you've ever used one for research, it's very 'in' the current literature, whatever that may be. It's an incredible retrieval tool, and it will glibly evaluate any novel ideas that you feed in, but analyses are often incorrect when there's a paucity of directly relevant training data.
reply
This is an active area of research. Demis Hassabis proposed training a model with a strict knowledge cutoff before 1915, and seeing whether it can independently arrive at general relativity.
reply
This is a really fascinating idea… Just another one for the list of side-projects I’d like to get around to but never will!
reply