upvote
The old model couldn't do math, the new one solved a big open problem.
reply
"Open AI claims that its model disproven an Erdős conjecture, therefore my crappy way of arguing about software quality is valid."

I really don't know how I'm supposed to reply to stuff like this.

reply
> Open AI claims

You undermine your own point when you misrepresent the situation like this. Real human mathematicians, including at least one Fields Medal winner, have validated and complimented the result.

reply
You demonstrate a truly remarkable level of bias (or just bad faith) in interpreting my comment.

Apparently it doesn't even occur to you that the claim made by Open AI has two necessary components. The first one is that the conjecture had been disproven. (This is what had been verified by "real human mathematicians".) The second necessary part of their claim is that the work to disprove the conjecture was done mostly by their AI model rather than by people employed by Open AI.

Funny thing is that even the explainer on OpenAI's own website points out the issue:

"This result does not show us all the times AI has claimed to have a proof of something and been wrong."

"I believe if the level and type of human expertise that is represented on this note had been assembled to find a counterexample to this conjecture a month ago, and those people put in similar amounts of time working on it than they did to reading and thinking about Chat GPT’s solution, the mathematicians would have found a counterexample."

[1] https://cdn.openai.com/pdf/74c24085-19b0-4534-9c90-465b8e29a...

reply
You seem to be saying model capabilities aren't improving. They are. The fact that many mathematicians have looked at the result and confirmed it and solved some other problems with the technique elevates this above claims.
reply
i mean i am very much still waiting for it to not be slop, but fable actually i think made a bit of headway in this direction, the code it writes what little of it i saw, makes me want to fall over dead slightly less than other models.
reply