upvote
I haven't read the full paper yet, but my intuition is that hallucinations are a byproduct of models having too much information that needs to be compressed for generalizing.

We already know that larger models hallucinate less since they can store more information, are there any smaller models which hallucinate less

reply
I'd recommend at least checking out the conclusions section. What I can tell you is that with LLMs, it's never a linear correlation. There's always some balance you have to strike, as they really do operate on a changing-anything-changes-everything basis.

excerpt: Claim: Avoiding hallucinations requires a degree of intelligence which is exclusively achievable with larger models. Finding: It can be easier for a small model to know its limits. For example, when asked to answer a Māori question, a small model which knows no Māori can simply say “I don’t know” whereas a model that knows some Māori has to determine its confidence. As discussed in the paper, being “calibrated” requires much less computation than being accurate.

reply