The question-tokens define the answer-tokens. That's it. The art relies in clustering the relevant weights together.
Circuits which emerge in the layers during training are much more complicated than a simple Bayesian relation.
There can be, you don't know if the closed source models aren't using something like DeepSeek's Engram.
While DeepSeek describe this as "knowledge lookup", what Engram is really trying to do is separate dynamic reasoning from static pattern recall, with the static patterns just being word-level n-gram statistics, not declarative facts/knowledge.
Just because 2-3 words often appear together in a sequence doesn't mean they represent a fact or truth (or falsehood) - it is just an n-gram statistical regularity.
If Engram helps reduce LLM GPU memory and FLOP requirements then that is great, but it's not a solution for Hallucination.