That's ten days ago. As the commenter pointed out, without a web search tool there's no possible way for the model to know whether it's true or not, and the people conducting the study didn't give the models a way to respond with "I don't know".
Thanks; I didn't spot that they disabled tools in the harness. Also they don't provide an "out" to allow the models to express uncertainty so the instructions force a guess to be made.
As an aside though it's still funny that the two tools WITH search also disagreed.
It's impossible to answer unless you have a *100% complete search tool*.
No sytem can know everything. It doesn't matter how many tools you give it. It's always wrong to force binary True / False without shades of "I don't know"