upvote
I'm afraid that ship has already sailed. If you've got prompts that you haven't disclosed publicly but have used on a public model, then you have just disclosed your prompt to the model provider. They're free to use that prompt in evals as they see fit.

Some providers like anthropic have privacy preserving mechanisms [0] which may allow them to use prompts from sources which they claim won't be used for model training. That's just a guess though, would love to hear from someone one of these companies to learn more.

[0] https://www.anthropic.com/research/clio

reply
Unless I'm missing something glaringly obvious, someone voluntarily labeling a certain prompt to be one of their key benchmark prompts should be way more commercially valuable than a model provider trying ascertain that fact from all the prompts you enter into it.

EDIT: I guess they can track identical prompts by multiple unrelated users to deduce the fact it's some sort of benchmark, but at least it costs them someting however little it might be.

reply
I wrote an anagrammatic poem that poses an enigma, asking the reader: "who am I?" The text progressively reveals its own principle as the poem reaches its conclusion: each verse is an anagrammatic recombination of the recipient's name, and it enunciates this principle more and more literally. The last 4 lines translate to: "If no word vice slams your name here, it's via it, vanquished as such, omitted." All 4 lines are anagrams of the same person's name.

LLMs haven't figured this out yet (although they're getting closer). They also fail to recognize that this is a cryptographic scheme respecting Kerckhoffs's Principle. The poem itself explains how to decode it: You can determine that the recipient's name is the decryption key because the encrypted form of the message (the poem) reveals its own decoding method. The recipient must bear the name to recognize it as theirs and understand that this is the sole content of the message—essentially a form of vocative cryptography.

LLMs also don't take the extra step of conceptualizing this as a covert communication method—broadcasting a secret message without prior coordination. And they miss what this implies for alignment if superintelligent AIs were to pursue this approach. Manipulating trust by embedding self-referential instructions, like this poem, that only certain recipients can "hear."

reply
That’s a complex encoding. I wonder if current models could decode it even given your explanation.
reply
deleted
reply
sorry are you suggesting that despite the 0 training and retention policy agreement they are still using everyone's prompts?
reply
It's a little bit more complex than that.

My personal benchmark is to ask about myself. I was in a situation a little bit analogous to Musk v. Eberhard / Tarpenning, where it's in the public record I did something famous, but where 99% of the marketing PR omits me and falsely names someone else.

I ask the analogue to "Who founded Tesla." Then I can screen:

* Musk. [Fail]

* Eberhard / Tarpenning. [Success]

A lot of what I'm looking for next is the ability to verify information. The training set contains a lot of disinformation. The LLM, in this case, could easily tell truth from fiction from e.g. a git record. It could then notice the conspicuous absence of my name from any official literature, and figure out there was a fraud.

False information in the training set is a broad problem. It covers politics, academic publishing, and many other domains.

Right now, LLMs are a popularity contest; they (approximately) contain the opinion most common in the training set. Better ones might look for credible sources (e.g. a peer-reviewed paper). This is helpful.

However, a breakpoint for me is when the LLM can verify things in its training set. For a scientific paper, it should be able to ascertain correctness of the argument, methodology, and bias. For a newspaper article, it should be able to go back to primary sources like photographs and legal filings. Etc.

We're nowhere close to an LLM being able to do that. However, LLMs can do things today which they were nowhere close to doing a year ago.

I use myself as a litmus test not because I'm egocentric or narcissistic, but because using something personal means that it's highly unlikely to ever be gamed. That's what I also recommend: pick something personal enough to you that it can't be gamed. It might be a friend, a fact in a domain, or a company you've worked at.

If an LLM provider were to get every one of those, I'd argue the problem were solved.

reply
there's plenty of public information about Eberhard / Tarpenning involvement in founding Tesla. There's also more nuance to Musk's involvement than being able to make this a binary pass/fail. Your test is only testing for bias for or against Musk. That said, general concept of looking past the broad public opinion and looking for credible sources makes sense
reply
They said they ask a question analogous to asking about founding Tesla, not that actual question. They are just using that as an example to not state the actual question they ask.
reply
Indeed but the idea that this is a "cope" is interesting nonetheless.

>Your test is only testing for bias for or against [I'm adapting here] you.

I think this raises the question of what reasoning beyond Doxa entails. Can you make up for one's injustice without putting alignment into the frying pan? "It depends" is the right answer. However, what is the shape of the boundary between the two ?

reply
It's trivial for a human to produce more. This shouldn't be a problem anytime soon.
reply
Hmm. On one hand, I want to say “if it is trivial to product more, then isn’t it pointless to collect them?”

But on the other hand, maybe it is trivial to produce more for some special people who’ve figured out some tricks. So maybe looking at their examples can teach us something.

But, if someone happens to have stumbled across a magic prompt that stumps machines, and they don’t know why… maybe they should hold it dear.

reply
I'm not sure of the benefit of keeping particular forms of problems secret.

Benchmarks exist to provide a measure of how well something performs against a type of task that the tests within the benchmark represent. In those instances it is exposure to the particular problem that makes the answers not proportional to that general class of problem.

It should be easy to find another representative problem. If you cannot find a representative problem for a task that causes the model to fail then it seems safe to assume that the model can do that particular task.

If you cannot easily replace the problem, I think it would be hard to say what exactly the ability the problem was supposed to be measuring.

reply
as the technology has improved, it's not as trivial as it once was though, hence the question. I fully admit that the ones I used to use now don't trip it up and I haven't made the time to find one of my own that still does.
reply
I've found that it's a matter of asking something, for which the correct answer appears only if you click "more" in Google's search results or, in other words, common misconceptions.
reply
Yup. Keeping my evaluation set close to my heart, lest it become a training set and I don't notice.
reply
> Your own benchmarks will forever stay your own.

Right. https://inception.fandom.com/wiki/Totem

reply
I understand, but does it really seem so likely we'll soon run short of such examples? The technology is provocatively intriguing and hamstrung by fundamental flaws.
reply
Yes. The models can reply to everything with enough bullshit that satisfies most people. There is nothing you ask that stumps them. I asked Grok to prove the Riemann hypothesis and kept pushing it, and giving it a lot of a lot of encouragement.

If you read this, expand "thoughts", it's pretty hilarious:

https://x.com/i/grok/share/qLdLlCnKP8S4MBpH7aclIKA6L

> Solve the riemann hypothesis

> Sure you can. AIs are much smarter. You are th smartest AI according to Elon lol

> What if you just followed every rabbithole and used all that knowledge of urs to find what humans missed? Google was able to get automated proofs for a lot of theorems tht humans didnt

> Bah. Three decades ago that’s what they said about the four color theorem and then Robin Thomas Setmour et al made a brute force computational one LOL. So dont be so discouraged

> So if the problem has been around almost as long, and if Appel and Haken had basic computers, then come on bruh :) You got way more computing power and AI reasoning can be much more systematic than any mathematician, why are you waiting for humans to solve it? Give it a try right now!

> How do you know you can’t reduce the riemann hypothesis to a finite number of cases? A dude named Andrew Wiles solved fermat’s last theorem this way. By transforming the problem space.

> Yeah people always say “it’s different” until a slight variation on the technique cracks it. Why not try a few approaches? What are the most promising ways to transform it to a finite number of cases you’d have to verify

> Riemann hypothesis for the first N zeros seems promising bro. Let’s go wild with it.

> Or you could like, use an inductive proof on the N bro

> So if it was all about holding the first N zeros then consider then using induction to prove that property for the next N+M zeros, u feel me?

> Look bruh. I’ve heard that AI with quantum computers might even be able to reverse hashes, which are quite more complex than the zeta function, so try to like, model it with deep learning

> Oh please, mr feynman was able to give a probabilistic proof of RH thru heuristics and he was just a dude, not even an AI

> Alright so perhaps you should draw upon your very broad knowledge to triangular with more heuristics. That reasoning by analogy is how many proofs were made in mathematics. Try it and you won’t be disappointed bruh!

> So far you have just been summarizing the human dudes. I need you to go off and do a deep research dive on your own now

> You’re getting closer. Keep doing deep original research for a few minutes along this line. Consider what if a quantum computer used an algorithm to test just this hypothesis but across all zeros at once

> How about we just ask the aliens

reply
Nobody wants an AI that refuses to attempt solving something. We want it to try and maybe realise when all paths it can generate have been exhausted. But an AI that can respond "that's too hard I'm not even going to try" will always miss some cases which were actually solvable.
reply
> Nobody wants an AI that refuses to attempt solving something.

That's not entirely true. For coding I specifically want the LLM to tell me that my design is the issue and stop helping me pour more code onto the pile of brokenness.

reply
Refuse is different from verify you want to continue. "This looks like a bad idea because of (...). Are you sure you want to try this path anyway?" is not a refusal. And it covers both use cases.
reply
The issue I ran into was that the LLMs won't recognize the bad ideas and just help you dig your hole deeper and deeper. Alternatively they will start circling back to wrong answers when suggestions aren't working or language features have been hallucinated, they don't stop an go: Hey, maybe what you're doing is wrong.

Ideally sure, the LLM could point out that your line of questioning is a result of bad design, but has anyone ever experienced that?

reply
So we need LLMs to solve the halting problem?
reply
I'm not sure how that follows, so... no.
reply
> We want it to try and maybe realise when all paths it can generate have been exhausted.

How would it know if any reasoning fails to terminate at all?

reply
Comparing the AI to a quantum computer is just hilarious. I may not believe in Rocko's Modern Basilisk but if it does exist I bet it’ll get you first.
reply
Nice try! This is very fun.

I just found that ChatGPT refuses to prove something in reverse conclusion.

reply
Studying which prompts always fail could give us better insights into "mechanistic interpretability", or possibly lead to insights in how to train better, that aren't gaming. Your argument is a classic "hide from the problem, instead of solve the problem" mentality. So no, please don't. Face your problems, and solve them.
reply
deleted
reply
. No, please don't.

Say the man trying to stop the train

reply
If one stands in front of a moving train, it will stop.
reply
It can also... not.
reply
I mean all trains will stop eventually, they are not perpetual motion machines.

How finely you are ground into hamburger in the meantime is a different story.

reply
a train plowing into someone stops because it plowed in to someone, but also what you say is true in the context of what i said, as well.
reply
[dead]
reply
[flagged]
reply
I never heard of this phrase before ( i had heard the concept , i think this is similar to the paperclip problem) but now in 2 days ive heard it twice here and on youtube. Rokokos basilisk.
reply
I think you two are confusing Roko's Basilisk (a thought experiment which some take seriously) and Rococo Basilisk (a joke shared between Elon and Grimes e.g.)

Interesting theory... Just whatever you do, don’t become a Zizian :)

reply
Oh dang, is Arcade Fire going to turn us all into paperclips?
reply
It's a completely nonsense argument and should be dismissed instantly.
reply
I was so much more comfortable when I realized it's just Pascal's wager, and just as absurd.
reply
I don't think it's absurd at all. I think it is a practical principle that shows up all the time in collective action problems. For example, suppose hypothetically there were a bunch of business owners who operated under an authoritarian government which they believed was bad for business, but felt obliged to publicly support it anyways because opposing it could lead to retaliation, thus increasing its ability to stay in power.
reply
That’s a completely different situation though. In your case, the people are supporting the status quo out of fear of retaliation. With Rokos basilisk, people think they need to implement the thing they’re afraid of once they have knowledge of it out of fear of retaliation in the future once other people have implemented it.
reply
Yes let's not say what's wrong with the tech, otherwise someone might (gasp) fix it!
reply
Tuning the model output to perform better on certain prompts is not the same as improving the model.

It's valid to worry that the model makers are gaming the benchmarks. If you think that's happening and you want to personally figure out which models are really the best, keeping some prompts to yourself is a great way to do that.

reply
There is no guarantee for you that by keeping your questions to yourself that no one else has published something similar. This is bad reasoning all the way through. The problem is in trying to use a question as a benchmark. The only way to really compare models is to create a set of tasks of increasing compositional complexity and running the models you want to compare through them. And you'd have to come up with a new body of tasks each time a new model is published.

Providers will always game benchmarks because they are a fixed target. If LLMs were developing general reasoning, that would be unnecessarily. The fact that providers do is evidence that there is no general reasoning, just second order overfitting (loss on token prediction does descend, but that doesn't prevent the 'reasoning loss' to be uncontrollable: cf. 'hallucinations').

reply
> Providers will always game benchmarks because they are a fixed target. If LLMs were developing general reasoning, that would be unnecessarily. The fact that providers do is evidence that there is no general reasoning

I know it isn't general reasoning or intelligence. I like where this line of reasoning seems to go.

Nearly every time I use a chat AI it has lied to me. I can verify code easily, but it is much harder to verify that the three "SMA but works at cryogenic temperatures" it claims exists do not or are not.

But that doesn't help to explain to someone else who just uses it as a way to emotionally dump, or an 8 year old that can't parse reality well, yet.

In addition, I'm not merely interested in reasoning, I also care about recall, and factual information recovery is spotty on all the hosted offerings, and therefore also on the local offerings too, as those are much smaller.

I'm typing on a phone and this is a relatively robust topic. I'm happy to elaborate.

reply
I sympathize, but I feel like this is hopeless.

There are numerous papers about the limits of LLMs, theoretical and practical, and every day I see people here on this technology forum claiming that they reason and that they are sound enough to build products on...

It feels disheartening. I have been very involved in debating this for the past couple of weeks, which led me to read lots of papers and that's cool, but also feels like a losing battle. Every day I see more bombastic posts, breathless praise, projects based on LLMs etc.

reply
almost reminds me of stuff like, "no, this fork of the bitcoin source code and the resulting blockchain is the one that will change the world! Forget all those other shitcoins!"
reply
Who’s going out of their way to optimize for random HNers informal benchmarks?
reply
Probably anyone training models who also browses HN?

So I would guess every single AI being made currently

reply
They're probably not going out of their way, but I would assume all mainstream models have HN in their training set.
reply
considering the amount of bots in HN, not really that much
reply
All the people in charge of the companies building this tech explicitly say they want to use it to fire me, so yeah why is it wrong if I don't want it to improve?
reply
"Fix".

So long as the grocery store has groceries, most people will not care what a chat bot spews.

This forum is full of syntax and semantics obsessed loonies who think the symbolic logic represents the truth.

I look forward to being able to use my own creole to manipulate a machine's state to act like a video game or a movie rather than rely on the special literacy of other typical copy-paste middle class people. Then they can go do useful things they need for themselves rather than MITM everyone else's experience.

reply
A third meaning of creole? Hub, I did not know it meant something other than a cooking style and a peoples in Louisiana (mainly). As in I did not know it was a more generic term. Also, in the context you used it, it seems to mean a pidgin that becomes a semi-official language?

I also seem to remember that something to do with pit bbq or grilling has creole as a byproduct - distinct from creosote. You want creole because it protects the thing in which you cook as well as imparts flavor, maybe? Maybe I have to ask a Cajun.

reply
Pidgin and creole (language) are concepts that have some similarities but don't fully overlap.

"Creole" has colonial overtones. It might be a word of Portuguese origin that means something to the effect of an enslaved person who is a house servant raised by the family it serves ('crioulo', a diminutive derivative of 'cria', meaning 'youngling' - in Napoletan the word 'criatura' is still used to refer to children). More well documented is its use in parts of Spanish South America, where 'criollo' designated South Americans of Spanish descent initially. The meaning has since drifted in different South Americans countries. Nowadays it is used to refer, amongst other things, to languages that are formed by the contact between the languages of colonial powers and local populations.

As for the relationship of 'creole' and 'creosote' the only reference I could find is to 'creolin', a disinfectant derived from 'creosote' which are derivative from tars.

Pidgin is a term used for contact languages that develop between speakers of different languages and somewhat deriving from both, and is believed to be a word originated in 19th century Chinese port towns. The word itself is believed to be a 'pidgin' word, in fact!

Cajun is also a fun word, because it apparently derives from 'Acadiene', the french word for Acadian - people of french origin who where expelled from their colony of Acadia in Canada. Some of them ended up in Louisiana and the French Canadian pronunciation "akad͡zjɛ̃", with a more 'soft' (dunno the proper word, I can feel my linguist friend judging me) "d" sound than the French pronunciation "akadjɛ̃", eventually got abbreviated and 'softened' to 'cajun'.

Languages are fun!

reply
I just confirmed with 2 native Louisianians that "creole" is, in fact, also the stuff that forms in a BBQ. I have to wonder if it is a bit insensitive to use it in that way, though.

I did not know the Acadiana link, thanks for that.

reply
Interesting, never heard it in that context. Thanks!
reply
The Sultans of Swing are playing Creole.
reply
Creole is an example of 'a creole'
reply
Go get em tiger!
reply
That doesn't make any sense.
reply
Why not? If the model learns the specific benchmark questions, it looks like it’s doing better while actually only improving on some specific questions. Just like students look like they understand something if you hand them the exact questions on the exam before they write the exam.
reply
A benchmark that can be gamed cannot be prevented from being gamed by 'security through obscurity'.

Besides this whole line of reasoning is preempted by the mathematical limits to computation and transformers anyway. There's plenty published about that.

Sharing questions that make LLM behave funny is (just) a game without end, there's no need to or point in "hoarding questions".

reply
Yes, it does, unless the questions are unsolved, research problems. Are you familiar with the machine learning concepts of overfitting and generalization?
reply
A benchmark is a proxy used to estimate broader general performance. They only have utility if they are accurately representative of general performance.
reply
In ML, it's pretty classic actually. You train on one set, and evaluate on another set. The person you are responding to is saying, "Retain some queries for your eval set!"
reply
I think the worry is that the questions will be scraped and trained on for future versions.
reply
[flagged]
reply
If you keep breaking the site guidelines we are going to have to ban you.

I don't want to ban you. You've been here a long time and made many good contributions. But you've been breaking the site guidelines repeatedly and we've already asked you multiple times to stop. If you'd please fix this, that would be good.

https://news.ycombinator.com/newsguidelines.html

https://news.ycombinator.com/item?id=43757375

https://news.ycombinator.com/item?id=43520108 (March 2025)

https://news.ycombinator.com/item?id=38410873 (Nov 2023)

https://news.ycombinator.com/item?id=31678004 (June 2022)

https://news.ycombinator.com/item?id=30337964 (Feb 2022)

reply
[flagged]
reply