You can only say True, False, Mostly True or Misleading.
(And you're not allowed to search for information.)
Other burning questions: What methodology was used to choose the question set? Why not allow explanations? How many passes were done for each LLM?