upvote
While I get that this is how LLMs work, I think you should think backwards from the user / from what AI as a field is aiming for and recognize that the „naive“ way of the parent to ask for reliable responses no matter what the „context“ is, is exactly what a good AI system should offer.

„The context is the input“ betrays a misunderstanding of what (artificial) intelligence systems are aiming for.

reply
Then we need something else. This is not how LLMs work. They are simple statistical predictors, now universal anwsering machines.
reply
I agree mostly. They are all that you say, but if you think about the conditional distribution that you are learning, there is nothing preventing us in principle from mapping different contexts to the same responses. It is rather a practical limitation that we don’t have sufficient tools of shaping these distributions very soundly. All we can do is throw data at them and hope that they generalize to similar contexts.

We have observed situations where agentic LLM traces on verifiable problems with deterministic (greedy) decoding lead to either completely correct or completely wrong solutions depending on the minutes on the clock which are printed as coincidental output of some tool that the LLM used.

I think there may be some mild fixes to current models available , for example it is worrying that the attention mechanism can never fully disregard any token in the input, because the softmax will always assign a > 0 weight everywhere (and the NN has no way of setting a logit to -infinity). This directly causes that it is extremely difficult for the LLM to fully ignore any part of the context reliably.

However Yann LeCun actually offers some persuasive arguments that autoregressive decoding has some limitations and we may need something better.

reply
> They are simple statistical predictors, now universal anwsering machines.

I see this a lot. I kinda' doubt the "simple" part, but even beyond that, is there any evidence that statistical predictor can't be a universal answering machine? I think there's plenty of evidence that our thinking is at least partially a statistical predictor (e.g. when you see a black sheep you don't think "at least one side of this sheep is black", you fully expect it to be black on both sides)

I'm not saying that LLMs _are_ universal answering machines. I'm wondering why people question that they are/they can become one, based on the argument that "fundamentally they are statistical predictors". So they are. So what?

reply
Does your definition of "universal answering machine" include the answers being correct?

If it does, statistical predictors can't help you because they're not always correct or even meaningful (correlation does not imply causation).

If it doesn't then, by all means, enjoy your infinite monkeys

reply
> These kinds of requirements betray a misunderstanding of what these LLMs are.

They do not. Refusing to bend your requirements to a system that can't satisfy them is not evidence of misunderstanding the system.

And if you tack on "with X 9s of reliability" then it is something LLMs can do. And in the real world every system has a reliability factor like that.

reply
Sure. But the context always starts with the first input, right? And how can you guarantee—or why should you guarantee—that the reply to the first input will always be the same? And if that’s not the case, how can we ensure the preceding context remains consistent?
reply
If an input along with the context generated some random seed or hash this would certainly be possible. Just paste your seed over to your coworker, they supply it to the model and it contains all contextual information.
reply
I wonder if there's a way to use an LLM to rewrite the prompt, standardizing the wording when two prompts mean the same thing?
reply
It's going to backfire. In real scenarios (not regression testing) users don't want to see the exact same thing twice out of the LLM in the same session in spite of trying to refine the result with more context.

There are going to be false positives: text that is subtly different from a previous response is misidentified as a duplicate such that the previous response is substituted for it, frustrating the user.

reply
Google search rewrites misspelled search queries and also lets you override it if that's not what you want. Maybe something similar would work?
reply
Not an expert, but I've been told RAG in combination with a database of facts is one way to get more consistency here. Using one of the previous examples, you might have a knowledge store (usually a vector database of some kind) that contains a mapping of countries to capitols and the LLM would query it whenever it had to come up with an answer rather than relying on whatever was baked into the base model.
reply
Deterministically, you mean? ;)
reply
oh so you want it to be thinking???? now we talking
reply