upvote
There is no such thing as "exactly the same input, but with different preceding context". The preceding context is input!

If you were to obtain exactly the same output for a given input prompt, regardless of context, then that would mean that the context is being ignored, which is indistinguishable from the session not maintaining any context such that each prompt is in a brand new empty context.

Now what some people want is requirements like:

- The different wording of a prompt with exactly the same meaning should not change anything in the output; e.g. whether you say "What is the capital of France" or "What is France's capital" the answer should be verbatim identical.

- Prior context should not change responses in ways that don't have any interaction with the context. For instance, a prompt is given "what is 2 + 2", then the answer should always be the same, except if the context instructs the LLM that 2 + 2 is to be five.

These kinds of requirements betray a misunderstanding of what these LLMs are.

reply
I wonder if there's a way to use an LLM to rewrite the prompt, standardizing the wording when two prompts mean the same thing?
reply
Deterministically, you mean? ;)
reply
It's going to backfire. In real scenarios (not regression testing) users don't want to see the exact same thing twice out of the LLM in the same session in spite of trying to refine the result with more context.

There are going to be false positives: text that is subtly different from a previous response is misidentified as a duplicate such that the previous response is substituted for it, frustrating the user.

reply
Not an expert, but I've been told RAG in combination with a database of facts is one way to get more consistency here. Using one of the previous examples, you might have a knowledge store (usually a vector database of some kind) that contains a mapping of countries to capitols and the LLM would query it whenever it had to come up with an answer rather than relying on whatever was baked into the base model.
reply
This is really useful in reproducing bugs.
reply
I was with you until you said it “doesn’t really help”. Did you mean “doesn’t completely solve the problem “?
reply