points
Other burning questions: What methodology was used to choose the question set? Why not allow explanations? How many passes were done for each LLM?