upvote
The model could report the confidence of its output distribution, but it isn't necessarily calibrated (that is, even if it tells you that it's 70% confident, it doesn't mean that it is right 70% of the time). Famously, pre-trained base models are calibrated, but they stop being calibrated when they are post-trained to be instruction-following chatbots [1].

Edit: There is also some other work that points out that chat models might not be calibrated at the token-level, but might be calibrated at the concept-level [2]. Which means that if you sample many answers, and group them by semantic similarity, that is also calibrated. The problem is that generating many answer and grouping them is more costly.

[1] https://arxiv.org/pdf/2303.08774 Figure 8

[2] https://arxiv.org/pdf/2511.04869 Figure 1.

reply
In absolute terms sure, but the token stream's confidence changes as it's coming out right? Consumer LLMs typically have a lot window dressing. My sense is this encourages the model to stay on-topic and it's mostly "high confidence" fluff. As it's spewing text/tokens back at you maybe when it starts hallucinating you'd expect a sudden dip in the confidence?

You could color code the output token so you can see some abrupt changes

It seems kind of obvious, so I'm guessing people have tried this

reply
Look up “dataloom”. People have been playing with this idea for a while. It doesn’t really help with spotting errors because they aren’t due to a single token (unless the answer is exactly one token) and often you need to reason across low probability tokens to eventually reach the right answer.
reply
Having a confidence score isn't as useful as it seems unless you (the user) know a lot about the contents of the training set.

Think of traditional statistics. Suppose I said "80% of those sampled preferred apples to oranges, and my 95% confidence interval is within +/- 2% of that" but then I didn't tell you anything about how I collected the sample. Maybe I was talking to people at an apple pie festival? Who knows! Without more information on the sampling method, it's hard to make any kind of useful claim about a population.

This is why I remain so pessimistic about LLMs as a source of knowledge. Imagine you had a person who was raised from birth in a completely isolated lab environment and taught only how to read books, including the dictionary. They would know how all the words in those books relate to each other but know nothing of how that relates to the world. They could read the line "the killer drew his gun and aimed it at the victim" but what would they really know of it if they'd never seen a gun?

reply
I think your last point raises the following question: how would you change your answer if you know they read all about guns and death and how one causes the other? What if they'd seen pictures of guns? And pictures of victims of guns annotated as such? What if they'd seen videos of people being shot by guns?

I mean I sort of understand what you're trying to say but in fact a great deal of knowledge we get about the world we live in, we get second hand.

There are plenty of people who've never held a gun, or had a gun aimed at them, and.. granted, you could argue they probably wouldn't read that line the same way as people who have, but that doesn't mean that the average Joe who's never been around a gun can't enjoy media that features guns.

Same thing about lots of things. For instance it's not hard for me to think of animals I've never seen with my own eyes. A koala for instance. But I've seen pictures. I assume they exist. I can tell you something about their diet. Does that mean I'm no better than an LLM when it comes to koala knowledge? Probably!

reply
It’s more complicated to think about, but it’s still the same result. Think about the structure of a dictionary: all of the words are defined in terms of other words in the dictionary, but if you’ve never experienced reality as an embodied person then none of those words mean anything to you. They’re as meaningless as some randomly generated graph with a million vertices and a randomly chosen set of edges according to some edge distribution that matches what we might see in an English dictionary.

Bringing pictures into the mix still doesn’t add anything, because the pictures aren’t any more connected to real world experiences. Flooding a bunch of images into the mind of someone who was blind from birth (even if you connect the images to words) isn’t going to make any sense to them, so we shouldn’t expect the LLM to do any better.

Think about the experience of a growing baby, toddler, and child. This person is not having a bunch of training data blasted at them. They’re gradually learning about the world in an interactive, multi-sensory and multi-manipulative manner. The true understanding of words and concepts comes from integrating all of their senses with their own manipulations as well as feedback from their parents.

Children also are not blank slates, as is popularly claimed, but come equipped with built-in brain structures for vision, including facial recognition, voice recognition (the ability to recognize mom’s voice within a day or two of birth), universal grammar, and a program for learning motor coordination through sensory feedback.

reply
Yes, the actual LLM returns a probability distribution, which gets sampled to produce output tokens.

[Edit: but to be clear, for a pretrained model this probability means "what's my estimate of the conditional probability of this token occurring in the pretraining dataset?", not "how likely is this statement to be true?" And for a post-trained model, the probability really has no simple interpretation other than "this is the probability that I will output this token in this situation".]

reply
It’s often very difficult (intractable) to come up with a probability distribution of an estimator, even when the probability distribution of the data is known.

Basically, you’d need a lot more computing power to come up with a distribution of the output of an LLM than to come up with a single answer.

reply
What happens before the probability distribution? I’m assuming say alignment or other factors would influence it?
reply
In microgpt, there's no alignment. It's all pretraining (learning to predict the next token). But for production systems, models go through post-training, often with some sort of reinforcement learning which modifies the model so that it produces a different probability distribution over output tokens.

But the model "shape" and computation graph itself doesn't change as a result of post-training. All that changes is the weights in the matrices.

reply
Can it generate one? Sure. But it won't mean anything, since you don't know (and nobody knows) the "true" distribution.
reply
> I'm not really sure, but maybe this XXX

You never see this in the response but you do in the reasoning.

reply
I would assume this is from case to case, such as:

- How aligned has it been to “know” that something is true (eg ethical constraints)

- Statistical significance and just being able to corroborate one alternative in Its training data more strongly than another

- If it’s a web search related query, is the statement from original sources vs synthesised from say third party sources

But I’m just a layman and could be totally off here.

reply
The LLM has an internal "confidence score" but that has NOTHING to do with how correct the answer is, only with how often the same words came together in training data.

E.g. getting two r's in strawberry could very well have a very high "confidence score" while a random but rare correct fact might have a very well a very low one.

In short: LLM have no concept, or even desire to produce of truth

reply
Still, it might be interesting information to have access to, as someone running the model? Normally we are reading the output trying to build an intuition for the kinds of patterns it outputs when it's hallucinating vs creating something that happens to align with reality. Adding in this could just help with that even when it isn't always correlated to reality itself.
reply
Huge leap there in your conclusion. Looks like you’re hand-waving away the entire phenomenon of emergent properties.
reply
> In short: LLM have no concept, or even desire to produce of truth

They do produce true statements most of the time, though.

reply
That's just because true statements are more likely to occur in their training corpus.
reply
The overwhelming majority of true statements isn't in the training corpus due to a combinatorial explosion. What it means that they are more likely to occur there?
reply
The training set is far too small for that to explain it.

Try to explain why one shotting works.

reply
Uh, to explain what? You probably read something into what I said while I was being very literal.

If you train an LLM on mostly false statements, it will generate both known and novel falsehoods. Same for truth.

An LLM has no intrinsic concept of true or false, everything is a function of the training set. It just generates statements similar to what it has seen and higher-dimensional analogies of those .

reply