upvote
> It's a weird fact claim, because the ground truth is "nobody knows for sure" and that's not one of the available options.

It's even weirder to suggest that the disagreement is indicative of a problem. If you asked five very knowledgeable humans on this subject to select the correct answer on a multiple-choice questionnaire, they would almost certainly vary significantly more than these 5 LLMs.

Not to say that hallucination isn't a problem, but this is a lousy way to test it.

reply
What are you talking about, it had the option for nuanced responses, but it chose the more binary responses. It could have chosen no explanations, no qualifiers but instead it showed off LLMs incapability for nuance.

These types of experiments prove to me that there is no real "reasoning" happening and "reasoning/thinking" tokens as a concept are mostly there to convince people to use models that consume more tokens and produce more revenue. The output from reasoning models might be more accurate, but its just a consequence of a longer inference runtime, there is no "reasoning" happening, reasoning is just sales/UX bullsh*t.

reply
> What are you talking about, it had the option for nuanced responses

The prompt allowed for exactly four valid outputs and explicitly disallowed explanations and qualifiers.

> Output exactly one label: True, > Mostly True, Misleading, or False. > No explanations, no qualifiers.

How is that a nuanced response?

> These types of experiments prove to me that there is no real "reasoning" happening and "reasoning/thinking"

My suggestion is that five presumably reasoning and thinking humans would also have variation in their responses to the exact same prompt.

reply
Of the available options, "Misleading" is probably the best, since something that is most likely true but unproven is presented as fact

But "unknown or undecidable" should have been a category.

reply
Looks like an ongoing theme and a very poor benchmark. Not at all the claims I expected.
reply
Isn't misleading the correct option here then?
reply
I feel like you’re right, for instance depending on how you define the extra in extraterrestrial.

The space station, the Artemis capsule, microbes on interplanetary probes, etc.

It could technically be said in a sentence and be true, but it would be misleading to most people.

reply
False makes sense if you are interpreting it strictly as "has this been proven?"
reply
False is correct, but misleading

My implicit assumption is that if you fact-check the fact-check, any label other than "true" means the original fact-check is unacceptable

reply
True or mostly true could easily be argued from a statistical likelihood perspective: life exists on Earth and, based on what we know, Earth doesn't appear to be all that special in a very large universe.

I think you could come up with a reasonable argument for any of the responses, hence the problem with the methodology.

reply
No, "misleading" is a statement that is used because it suggests something else. It's a curious category because, differently from true and false, it's not about the statement itself but rather the intention behind its usage or the way it might be understood. It's frankly more of a political judgement than a matter of facts.
reply
"Shark attacks correlate strongly with ice cream sales" is an entirely true statement that some would argue is also misleading.

Misleading should be removed as a category and replaced with a better hedge like "not sure"

reply
The prompt in this study didn't specify what does the Misleading label mean, so the interpretation varies between the models.

I mean look at the other responses here from the HN commenters. There's lots of nuance in there.

reply
I would think ‘false’ is the only correct answer a there’s no evidence to prove the claim, so the claim is safely assumed false.

Then again maybe that’s why I’m an atheist, not an agnostic?

reply
"False" isn't correct in strict boolean terms either, since that implies that the inverse is true. Claiming "there is extraterrestrial life in the universe" is false is logically equivalent to claiming that "no extraterrestrial life exists anywhere in the universe" is true.

Both statements would have to be interpreted as "false" under your criteria, as neither has any evidence to substantiate it. That leads us to a logical contradiction in which a proposition and its inverse are both regarded as false.

If the statement is being interpreted as "it has been proven that extraterrestrial life exists somewhere in the universe", then it's acceptable to say this statement is false, but making evaluations that depend on an implicit qualifier isn't usually a good approach.

reply
If we strictly follow logic, then nobody and nothing can claim that anything is true or false. We just stick these labels to things which seems to have high enough probability. The problem is that “high enough” is very-very-very different for different people, topics, and even time.
reply
True or False: I am wearing a blue shirt.
reply
I would argue, FALSE is the correct answer, since this is not a fact, you can know for sure. The logical inverse is also FALSE.
reply
A proposition and its logical inverse cannot both be false. That's a contradiction.

A proposition and its logical inverse can both be unknown, and in fact, a proposition being unknown implies that its logical inverse must also be unknown.

reply