undefined

points

by black_knight5 days ago |

comments

by user439285 days ago|

[-]

I also see a lot of people saying they are happy with weaker models.

At work I had to switch to using GPT 5.4 Mini and Qwen 3.6 27B.

The results were near useless.

The error rate is through the roof, it's constantly incorrect in its conclusions even when investigating very simple issues.

Further the models are too unreliable to even move 20 line snippets around without inadvertently modifying them. Ask them to correct it and they still get it wrong.

Maybe the larger Chinese models are better, but the Mini stuff is next to useless to me.

by black_knight5 days ago|

parent|

[-]

I have Qwen 3.6 27B and 35B running locally and and coming from Opus it feels like talking to an imposter. Someone who pretends to be competent, but really isn’t. Results are always disappointing. Sonnet is better, but I have given up on asking it. even for simple things I wait for my opus limits to reset.

by abalashov4 days ago|

parent|

[-]

Have you tried Kimi K2.6 or DeepSeek V4 (Flash or Pro)?

by daymanstep5 days ago|

prev|

[-]

What kind of problems are you trying to have it solve ?

by _kb5 days ago|

parent|

[-]

The Riemann hypothesis, PvNP, and the Collatz conjecture.

by black_knight5 days ago|

parent|

[-]

Not these. I wonder if the well is poisoned there. The models know that these are "unpossible", so it might not solve them just because… Maybe some day.

I am just testing it on stuff I know intimately myself. I would probably not understand a proof of Collatz if it was dansing in front of me!

by komali25 days ago|

parent|

[-]

So, what kind of problems are you having it try to solve?

Sorry to belabor this but it's basically pointless saying you have nuts it can't crack without showing us the nuts.

by black_knight5 days ago|

parent|

[-]

I don’t care to share my exact problems. Mostly because gpt -5.5 hallucinates false solutions, and I would rather not have people reply with "Oh but ChatGPT solves it!", because it takes expert knowledge to debunk them. To their credit ChatGPT will admit their, very fundamental mistakes when pointed out to them. But also because no-one would really care.

I gave a high level description of the problems in a sibling thread. They are the kind of small problems which I suppose every researcher has lying around, waiting for them to think about some day. But not the big problem everyone is waiting for to be solved.

My comment was not meant to be a tease – sorry! I assumed there would be other people in a similar situation, who might relate.

by neonstatic5 days ago|

parent|

prev|

[-]

Bro, you are being left behind bro, it's amazing bro...

by Lerc5 days ago|

parent|

prev|

[-]

That's a bit of a tricky point. I have had quite a lot of problems with models informing me what I am attempting is impossible. If no-one has done it, or at least it doesn't know about it being done it tends to fall back on people voicing their baseless speculations, and for just about anything you propose, you can find a person who will loudly proclaim it is impossible.

The curse of the 'use case' comes in here too. When people think that everything should have a use case, that's a lot of training data suggesting to a model that things should only be used for what someone has already thought of.

A couple of times I have had to manually code proof of concept pieces so that the model breaks out of that "unpossible" mode and actually helps me.

I can't remember if it was chatGPT or Claude, but when I showed it how to get a MessagePort in its JavaScript executor through to the artifact/canvas, it quickly went from "That can't be done" to positively enthusiastic about the possibilities. I suspect those shenanigans will be well off the table for Fable though.

by unnouinceput5 days ago|

parent|

prev|

[-]

Stop dancing and share the prompt, we're dying to see it

by black_knight5 days ago|

parent|

[-]

Hey, stop asking to see my nuts! My nuts are private – okay?

(Joking aside, see sibling threads.)

by andriy_koval4 days ago|

parent|

prev|

[-]

> The Riemann hypothesis, PvNP, and the Collatz conjecture.

Did you add "make no mistake" to your prompt?

by mastermage5 days ago|

parent|

prev|

[-]

is this a joke? Seriously? These are some of hardest problems in Math period. 100 if not thousands of the greates minds in history have attempted to solve these problems. And you think that the current level of AI can blow through them? It is also a possibility that for example the Riemann Hypothesis is just not provable. (Goedels Theorem).

by black_knight5 days ago|

parent|

[-]

No one is expecting that! I expect _kb was sarcastic/making a point.

Recently (last couple of months?) these models are becoming useful tools for mathematicians, because they can solve easier problems more quickly, meaning that one can tackle bigger challenges (but maybe not RH et al) piece by piece.

But, there are still definite limits, where one could expect an expert human to solve things, given time, but models do not. Thus, more intelligence would be nice!

by mastermage5 days ago|

parent|

[-]

if it was sarcastic then whoosh on me.

by _kb4 days ago|

parent|

[-]

It was a bit of humour. It would be much for feasible to have an LLM generate programs that solve those problems rather than solving directly. I tried to make a start, but I couldn't even vibe a simple tool that would let me reliably validate if generated solvers would halt or loop forever.

by mastermage4 days ago|

parent|

[-]

> if generated solvers would halt or loop forever.

I am pretty sure this time I am catching the sarcasm here. Kudos you had me in the first half.

by moffkalast4 days ago|

parent|

prev|

[-]

Ayy lmao

by black_knight5 days ago|

parent|

prev|

[-]

The medium ones are results where one needs to construct some object, which my intuition tells me should exist. The difficult ones are typically to show that certain objects can not be constructed.

These are not Fields medal type problems, nor know difficult/open conjectures. Just small stuff I have collected in my todo list over the years.

by Certhas5 days ago|

parent|

[-]

I have some medium difficulty math problems where I have used the models for the last year and a half repeatedly. Back then they were already good at pointing out obstructions and constructing counterexamples. So that tracks. But at first glance it looks like Fable actually made real progress on one problem for the first time.

A year ago my judgement was that I had wasted my time on trying to work with the models and doing things myself would have been more productive as I would have gained intuition from the failures. Now it definitely seems to have figured out stuff that would have taken me more time than I have to spare on this problem...

by black_knight5 days ago|

parent|

[-]

Cool! Yes, we are getting there.

Being a theory builder more than a problem solver I am excited for the future.

Also excited for fully formalised mathematics to hit main stream!

by tclancy5 days ago|

prev|

[-]

Perhaps you should rephrase those nuts?