upvote
I cannot keep answering everyone's comments of the type "Why did you consider / not consider?" or "Here are much better ideas". I promise you that we have thought quite a bit about the setup and have discussed it with many math researchers.

1. Why do you compare it to multiplying two 1000 digit numbers and not to factorizing a 4096-bit numbers into its 2 prime factors, when not knowing any details?

2. The questions are of theoretical nature, even if a little calculation is involved. This does not mean that the problems are not solvable using a computer program, but it means that they are not solvable with reasonalble effort with a computer program.

3. And we do not ask for proofs because other projects already do that (IMProofBench, please have a look) and we cannot grade LLM answers as a human would need to understand the provided proof -- and this is not what I or we or actually most researchers are interested in doing.

reply
> 1. Why do you compare it to multiplying two 1000 digit numbers and not to factorizing a 4096-bit numbers into its 2 prime factors, when not knowing any details?

The objection is to phrasing "much harder". One should distinguish between something that is difficult for reasons stemming from a lack of computational power and something that is difficult for reasons stemming from a lack of relevant abstractions or the ability to grapple with them. If the reason that a particular problem is "hard" for a PhD student is that they have to do a long calculation, but not because of a lack of conceptual understanding, then it doesn't say much about the capabilities of generative AI if the computer solves it.

Hence the example: multiplying two large numbers is hard for the former reason, not the latter. Your example of factoring a 4096-bit semiprime is hard for both reasons (because the brute force method is too slow).

reply
Well, you are correct that one should distinguish the two. But we give no indication that the questions are hard because of computational tasks and we give many indications that the problems are of theorecical nature and hard for theoretical reasons. There is not a single question where a PhD student would need to do a long calculation.

I trust the judgement of respected researchers submitting the questions, I personally know them, and they publish research under their full names (and whose names are fully disclosed in the paper). And you also should trust them.

Please consider disclosing your name and your field of expertise, pick a question in your own research area and explain to me why this question is not research-level. And, best of all, solve it yourself to clarify why it was too easy.

reply
I solve 034.

By [1, Theorem 4.1], the Neron-Severi rank of the perfectoid cover is the same as the Neron-Severi rank of the reduction. For a product E x E' of elliptic curves, it is well known that NS(E x E') = NS(E) + NS(E') + Hom(E,E'); see [2, Prop. 2.3]. Since E = E' here and E is supersingular, this number is 1 + 1 + 4 = 6.

Is it research level? It of course takes a graduate student a long time to understand, say, what a perfectoid space is. But the statement follows immediately from quoting the literature, as long as one knows what to quote.

1. https://arxiv.org/pdf/2105.05230 2. https://arxiv.org/pdf/1402.2233

reply
You see yourself that your own solution is purely of theoretical nature and not at all what you wrote before, right? (And no, I am not commenting on your answer.)
reply
Haha, the classic “Why didn’t you do X?” comments always appear. I think a lot of people underestimate how much quality researchers deeply think about such setups. My genuine standard rely to those folks is - do the research with your setup and publish it.
reply