undefined

points

[-]

This is not a benchmark. They just want to give people the opportunity to try their hand at solving novel questions with AI and see what happens. If an AI company pulls a solution out of their hat that cannot be replicated with the products they make available to ordinary people, that's hardly worth bragging about and in any case it's not the point of the exercise.

by YeGoblynQueenne39 minutes ago|

parent|

[-]

Hey, sorry, totally out of context but I've always wanted to ask about the username. I keep reading it as "yoruba" in my mind. What does it mean, if I'm not being indiscreet?

by fph2 hours ago|

parent|

prev|

[-]

The authors mention that before publications they tested these questions on Gemini and GPT, so they have been available to the two biggest players already; they have a head start.

by data_maan54 minutes ago|

parent|

[-]

Looks like very sloppy research.

by cocoto2 hours ago|

parent|

prev|

[-]

They could solve the problems and train the next models with the answers, as such the future models could “solve” theses.

by data_maan53 minutes ago|

prev|

[-]

Nothing prevents them, and they are already doing that. I work in this field and one can be sure that now, because of the notoriety this preprint got, the questions will be solved soon.

by conformist2 hours ago|

prev|

[-]

It's possible but unlikely given the short timeline, diverse questions that require multiple matheamticians, and low stakes. Also they've already run preliminary tests.

by blenderob2 hours ago|

parent|

[-]

> It's possible but unlikely given the short timeline

Yep. "possible but unlikely" was my take too. As another person commented, this isn't really a benchmark, and as long as that's clear, it seems fair. My only fear is that some submissions may be AI-assisted rather than fully AI-generated, with crucial insights coming from experienced mathematicians. That's still a real achievement even if it's human + AI collaboration. But I fear that the nuance would be lost on news media and they'll publish news about the dawn of fully autonomous math reasoning.

by iLoveOncall1 hours ago|

prev|

[-]

That was exactly my first thought as well. All those exercises are pointless and people don't seem to understand it, it's baffling.

Even if it's not Anthropic or OpenAI paying for the solutions, maybe it'll be someone solving them "for fun" because the paper got popular and posting them online.

It's a futile exercise.