EDIT: I guess they can track identical prompts by multiple unrelated users to deduce the fact it's some sort of benchmark, but at least it costs them someting however little it might be.
LLMs haven't figured this out yet (although they're getting closer). They also fail to recognize that this is a cryptographic scheme respecting Kerckhoffs's Principle. The poem itself explains how to decode it: You can determine that the recipient's name is the decryption key because the encrypted form of the message (the poem) reveals its own decoding method. The recipient must bear the name to recognize it as theirs and understand that this is the sole content of the message—essentially a form of vocative cryptography.
LLMs also don't take the extra step of conceptualizing this as a covert communication method—broadcasting a secret message without prior coordination. And they miss what this implies for alignment if superintelligent AIs were to pursue this approach. Manipulating trust by embedding self-referential instructions, like this poem, that only certain recipients can "hear."
My personal benchmark is to ask about myself. I was in a situation a little bit analogous to Musk v. Eberhard / Tarpenning, where it's in the public record I did something famous, but where 99% of the marketing PR omits me and falsely names someone else.
I ask the analogue to "Who founded Tesla." Then I can screen:
* Musk. [Fail]
* Eberhard / Tarpenning. [Success]
A lot of what I'm looking for next is the ability to verify information. The training set contains a lot of disinformation. The LLM, in this case, could easily tell truth from fiction from e.g. a git record. It could then notice the conspicuous absence of my name from any official literature, and figure out there was a fraud.
False information in the training set is a broad problem. It covers politics, academic publishing, and many other domains.
Right now, LLMs are a popularity contest; they (approximately) contain the opinion most common in the training set. Better ones might look for credible sources (e.g. a peer-reviewed paper). This is helpful.
However, a breakpoint for me is when the LLM can verify things in its training set. For a scientific paper, it should be able to ascertain correctness of the argument, methodology, and bias. For a newspaper article, it should be able to go back to primary sources like photographs and legal filings. Etc.
We're nowhere close to an LLM being able to do that. However, LLMs can do things today which they were nowhere close to doing a year ago.
I use myself as a litmus test not because I'm egocentric or narcissistic, but because using something personal means that it's highly unlikely to ever be gamed. That's what I also recommend: pick something personal enough to you that it can't be gamed. It might be a friend, a fact in a domain, or a company you've worked at.
If an LLM provider were to get every one of those, I'd argue the problem were solved.
>Your test is only testing for bias for or against [I'm adapting here] you.
I think this raises the question of what reasoning beyond Doxa entails. Can you make up for one's injustice without putting alignment into the frying pan? "It depends" is the right answer. However, what is the shape of the boundary between the two ?