undefined

points

by ratsimihah16 hours ago |

comments

by spzb15 hours ago|

[-]

Mathematical proof vs a web app that doesn't actually run? Not much of a contest.

Never mind the fact that AIs of the LLM-variety haven't and aren't going to find solutions to mathematical problems.

by sdoering14 hours ago|

parent|

[-]

> Never mind the fact that AIs of the LLM-variety haven't and aren't going to find solutions to mathematical problems.

This is empirically wrong as of early 2026.

Since Christmas 2025, 15 Erdos problems have been moved from "open" to "solved" on erdosproblems.com, 11 of them crediting AI models. Problems #397, #728, and #729 were solved by GPT-5.2 Pro generating original arguments (not literature lookups), formalized in Lean, and verified by Terence Tao himself. Problem #1026 was solved more or less autonomously by Harmonic's Aristotle model in Lean.

At IMO 2025, three separate systems (Gemini Deep Think, an OpenAI system, and Aristotle) independently achieved gold-medal performance, solving 5 of 6 problems.

DeepSeek-Prover-V2 hits 88.9% on MiniF2F-test. Top models solve 40% of postdoc-level problems on FrontierMath, up from 2%.

Tao's own assessment as of March 2026: AI is "ready for primetime" in math and theoretical physics because it "saves more time than it wastes."

You can disagree about where this is heading, but "haven't and aren't going to" doesn't survive contact with the data.

by fredoliveira14 hours ago|

parent|

[-]

Indeed. And adding on to this, in a slightly different realm, Donald Knuth's conjecture that he solved with Claude: https://www-cs-faculty.stanford.edu/%7Eknuth/papers/claude-c...

by spzb13 hours ago|

parent|

prev|

[-]

> solved more or less autonomously

So, not autonomously.

by sdoering12 hours ago|

parent|

[-]

q.e.d.

by ratsimihah14 hours ago|

parent|

prev|

[-]

You got really specific to help prove your point. We were generalising to projects built by AI, not web apps that don’t run, which isn’t relevant since LLMs can clearly build fully working projects.

Also how does getting into the specifics of which type of AI can solve mathematical problems helps the comparison here?

by spzb13 hours ago|

parent|

[-]

You were the one who made the comparison