I am just testing it on stuff I know intimately myself. I would probably not understand a proof of Collatz if it was dansing in front of me!
Sorry to belabor this but it's basically pointless saying you have nuts it can't crack without showing us the nuts.
I gave a high level description of the problems in a sibling thread. They are the kind of small problems which I suppose every researcher has lying around, waiting for them to think about some day. But not the big problem everyone is waiting for to be solved.
My comment was not meant to be a tease – sorry! I assumed there would be other people in a similar situation, who might relate.
The curse of the 'use case' comes in here too. When people think that everything should have a use case, that's a lot of training data suggesting to a model that things should only be used for what someone has already thought of.
A couple of times I have had to manually code proof of concept pieces so that the model breaks out of that "unpossible" mode and actually helps me.
I can't remember if it was chatGPT or Claude, but when I showed it how to get a MessagePort in its JavaScript executor through to the artifact/canvas, it quickly went from "That can't be done" to positively enthusiastic about the possibilities. I suspect those shenanigans will be well off the table for Fable though.
(Joking aside, see sibling threads.)
Did you add "make no mistake" to your prompt?
Recently (last couple of months?) these models are becoming useful tools for mathematicians, because they can solve easier problems more quickly, meaning that one can tackle bigger challenges (but maybe not RH et al) piece by piece.
But, there are still definite limits, where one could expect an expert human to solve things, given time, but models do not. Thus, more intelligence would be nice!
I am pretty sure this time I am catching the sarcasm here. Kudos you had me in the first half.