AA-Omniscience is the only AI benchmark I know of where randomly guessing gets you a lower average score than answering all questions with "I don't know"
For your scenario the confident confident strategy will give average of -90. Saying I dont't know to all will give 0.
A lot of models have negative AA-Omniscience Index.
They also do have AA-Omniscience Accuracy and AA-Omniscience Hallucination Rate that handle "I don't knows" differently.
They are much better incentives. In real life a wrong answer is much more damaging than a don't know.
If your example had "Validate any details before sharing them with the user, with multiple sources" as the system prompt, it was using a model that is strong at following system prompts precisely and had access to some basic tools, then it'd spend maybe minutes more, but the answer would have been way more accurate.
But no, Google want "the new search results" (LLM hallucinations) to be on top, so we end up with "sounds plausible" answers instead "Collection of evidence from reliable/semi-reliable" or similar, which sucks. We could have quality, but it's too expensive/slow, so we get slop instead, just to maximize for speed and convenience.
Like when agent 1 says X, agent 2 verifies it as Y and the original question ends up being some weird amalgamation of Z with additional ”this is really true” statements sprinkled on top.
I agree Google responses hurt more than help, but I’ve also gotten identical outcomes of 40min self-reasoning Opus threads (it’s less common obviously).
Yeah, seems what grounds agents right now is quite literally human thoughts and text, so if you're doing something like that, you really need to pass the original user prompt through the entire way, for every "child" to keep in mind the final thing, otherwise it does seem to spiral out of control.
I don't know. Is it?
I don't think anyone is trying to add "a coherent worldview" by reducing hallucinations, not sure how that even realistically could be aim.
What people want, is for the models to stop giving confident answers that are clearly incorrect. Yes, it won't lead to "a coherent worldview", but it'll at least stop wasting people's time if the model said "You know what, what you said doesn't make sense / isn't clear, is what you mean .... ?" or even "I'm not sure" or "I don't know".
Currently, if you have the wrong starting point, ask the model to do something, they more often than not just go ahead and do that, misunderstandings or not. They seem optimized to never push back, unless you prompt for that, and most seem to favor "I'm just gonna assume X" rather than taking a step back and figuring out how to not assume. Again, unless you prompt against that behaviour/steering it into a different workflow.
Training an extra "don't know" token means you have to build a moat between every other token. Between "yes" and "no", you don't have a muddled noisy area where both "yes" and "no" have relatively high probabilities, you need a new peak where "don't know" is higher. Then you just have new muddled areas between "yes" and "don't know", and "don't know" and "no". That requires even more finesse to train another answer in between.
Instead, you could check whether multiple options are about equally likely. But then you have to check if they are actually synonyms, like are the top two choices "Genève" and "Geneva", which is a good sign that the model knows the answer? Or are the top two "yes" and "no"?
The task was simple, using the MS-MARCO[0] dataset which contains queries, search results, answers, I made a training set that has:
1. Questions paired with real results supporting them (mixed with some irrelevant results), and a correct answer
2. Questions paired only with irrelevant results, with the answer “No answer present”
The dataset was huge (close to 1M samples), and I trained using different techniques, from SFT (just mimicking the dataset) to DPO (good answer contrasted with a bad answer for the same user query) to GRPO (verifier that checks my annotations whether an answer was present or not)
Lo and behold, this didn’t reduce hallucination, rather made it much worse. Now the model started claiming “No answer present” even when it is, or even when the question didn’t need search results in the first place (simple stuff like what is X+Y).
Now you could argue that my training was basic compared to what frontier labs could do. Yet I think it hints at a more profound limitation. LLMs are finicky and don’t have a neat understand of things from first principles (list of search results, check relevance of result to user query, if answers are below a certain threshold of relevance then don’t consider them to answer …).
tl;dr: not as simple as one might think, perhaps not attainable at all.
You can definitely tune a model to say "I don't know" more often but it will cost you performance, the model will reject some questions that it could answer meaningfully. In the degenerate case the model could collapse predicting that sequence always or almost always.
But I guess my logic breaks down here a bit, because if there is such a thing as a validated answer, then the correct answer is in fact never uncertainty. The correct answer is to continue post training until the model gets it right. So perhaps the real answer is to create RLVR tasks where the valid answer is "I don't know" and nothing else, like this benchmark does. Or maybe that doesn't work either, no matter how many you create.
I feel as though there is some kind of philosophical lesson to be had from how hard hallucinations are to get rid of. Maybe, similarly to humans, successful models are often "arrogant" in a sense. Perhaps you just never solve an Erdös problem without some degree of self deception that it's possible for you to do so. In this line of thinking, greatness in humans is actually not related to humility, but just being so good that you actually get things right when you try. Expressing humility is of course something great people tend to do, but I'm referring to what happens under the hood.
If you squint a bit, that's kinda the trend with models. The useful ones are not that much less likely to hallucinate, they are just good enough that they tend to get it right. This comparison is of course probably not even remotely correct, but at least it's fun to anthropomorphize a bit.
1) Has a certain standard of evidence been met?
2) Are the related arguments free of logical inconsistencies?
We can train the LLMs to do 2, and maybe even 1 to some extent (exactly what quality of evidence a computer can practically gather is limited). But that isn't going to get rid of hallucinations, for the same reason courts are hit-and-miss or the conclusions of studies often aren't very reliable. These techniques help, but sometimes they still get people to say things that, on close inspection, turn out to be nonsense. And those best-effort approaches are too much to expect for most questions an LLM will be handed which are informal, low stakes and don't need strong supporting evidence or logical rigour.
I think it is underestimated how many LLM-style hallucinations people themselves have. It just isn't obvious because most humans have a strategy of only repeating what the herd says after it has been socially vetted, which makes their individual eccentricities less obvious.
TLDR; I don't think it looks like an easy problem for RLVR, it looks technically unsolvable. Even making progress requires a philosophical breakthrough on the nature of truth so that the objective function can be established.
But even in muddy fields of reality like medicine, there are objective facts to be found. When someone comes into an ER with chest pain, you often find a true, undeniable reason for why that is happening. If their lung has collapsed, a coronary artery is clogged or the aortic artery is dissecting, even if you don't find that out it tends to be clear in retrospect. The area of reality that becomes muddy is when use proxy signals to try to figure out who gets promoted to expensive/harmful examinations we can make final conclusions from, or the cases that don't fit cleanly into one bucket or the other. But very often, the gold standard truly is golden.
Of course, many realms of reality cannot be verified in this way. But I'd argue that there are quite a few that can.
Does mathematics count as not a hallucination though? Particularly in pure mathematics they take a certain pride coming up with wild concepts as unrooted as possible in anything relevant to human existence. The name of the game is purely about maintaining internal logical consistency - which is something an AI can do while hallucinating.
AI hallucinations in maths might be logically consistent or not be. But in that particular case it starts to get a bit iffy what we call it when someone imagines something that doesn't exist. This gets back to the thing where we can train AIs to be logically consistent, but we can't force that consistency to be grounded in any particular universe. Ie, it'll hallucinate but in a very well rationalised way - coincidentally mimicking how a number of mathematicians seem to approach life.
This is the central issue; there is a very real trade-off between facts and verifiablity. Mathematics is perfectly verifiable because it is fact free. We don't have a reliable general system to verify facts. We do have reliable systems for checking arguments (logic).
Yes, there are mathematical concepts that seem to exist purely in the realm of mathematics, but maths often touches reality in a consistent way that reflect experimental results. This seems to imply that there is more to mathematics than just internal consistency. And the parts that do not correspond to any observation right now, might just reach out and touch reality in the future. It is possible to create logically consistent systems that have nothing to do with reality, but this is not the mathematics that most mathematicians are thinking about.
Observation is the final arbiter of fact. Maybe we don't have a general system to verify ALL facts, but many facts are 100% verifiable, although not most of them. "Beyond reasonable doubt" is of course the highest level of fact as far as the scientific method is concerned, but some facts are so far beyond reasonable doubt that you might as well just call them true. In the average living human body, there is a particular clump of tissues that consistently corresponds a concept most experts would describe as a "heart", and it does in fact pump blood. True fact.
We can rank them based on how much they know and people will gravitate towards those that do know more.
It's a market after all.
Haven't heard about that law, but seems unlikely we can come up with ("discover") any sort of law that uses a concept ("truth") humans can't even agree what it means, and that's not for a lack of trying, we've been trying to figure it out for millenniums already with no end in sight.
Might be why we're already rarely seeing models output an "I don't know".
"Confidently incorrect" has negative value. At best, a human realizes the answer is wrong and At worst, the incorrect information makes is not identified and can cause untold damage. By having the potential to be so severely wrong, it lessens the value of correct answers because there is a lower confidence value on their output.
If someone sold you a "Solved all your problems" machine, and it suddenly doesn't solve all your problems, then probably no, you shouldn't pay.
But the way I'm being sold LLMs, is basically "A text generator that gives your plausible-sounding human text that sometimes hallucinates and gets things wrong, based on your input", then regardless of what the outcome is, I still made use of the "Input > Output" part, which is what I bought into, so I should still pay for that.
Now of course bunch of people will say they been sold the former, but the companies themselves seem to be selling the latter. That's my perspective from a person who doesn't follow "influencers" and what not though, which seem to be selling the public on the former rather than the latter.
My ask:
> In a couple sentences, explain to me the product I'm being sold with ChatGPT. What does it do for me?
The Reply from ChatGPT:
> ChatGPT is a conversational AI that helps you think, create, learn, analyze, and get things done faster. You can use it to answer questions, draft and edit writing, summarize information, brainstorm ideas, learn new topics, write code, plan projects, and increasingly act as an assistant that can search for information, work with documents, generate images, and help complete tasks.
> In simple terms: you're buying access to an AI that turns natural language into useful work—saving time, expanding your capabilities, and giving you an always-available collaborator for both everyday tasks and specialized knowledge work.
This sounds much more like the former, a "solve all your problems" machine.... not a plausible-sounding text generation machine.
Only two weeks ago Sam Altman said their new data center "could" be where cancer gets cured[0]. It is only the people who deeply understand AI who see it as a text generator of plausible-sounding text. That isn't what the marketing department, the CEO, or the product itself seem to be saying. I'm using OpenAI as the example here, but the others don't seem much different.
> Can I trust the output you give me?
And I assume it explains what to trust VS not.
I think in the bottom you should also see something like "Any text can contain mistakes" or similar too, which I know is a far cry from what some people push in the press in regards to capabilities, but I still don't see the platforms themselves as lying about this, while I do see a bunch of people constantly over-hyping the possibilities.
I'm not sure why "can I trust the output you give me?" would be a logical followup to the first response it gave me, seeing as it's response didn't say anything about hallucinations or mistakes. It said it could do "useful work" with all kinds of examples, including "specialized knowledge work".
The note under the text field, in gray as to not draw the user's attention, feels more like a CYA line from the lawyers, rather than an instruction they really want users to take to heart. That line also doesn't appear on the main home page. I only shows up after the first prompt is submitted and focus shifts to the conversation. I don't think a CYA line in gray fine print is enough to make users understand it's a plausible-sounding text generation machine instead of an answer machine. Even if I ask that point blank it gives a wordy... yes, but not really, it's being debated by philosophers... response.
> If you can dream it, Claude can help you do it. Claude can process large amounts of information, brainstorm ideas, generate text and code, help you understand subjects, coach you through difficult situations, simplify your busywork so you can focus on what matters most, and so much more.
What marketing copy have you read for LLMs that is like you mentioned?
> But the way I'm being sold LLMs, is basically "A text generator that gives your plausible-sounding human text that sometimes hallucinates and gets things wrong, based on your input"
so, thats all.