undefined

upvote

points

by adamgordonbell11 hours ago |

upvote

by urutom4 hours ago|

[-]

What I find fascinating about the shared prompt isn’t just the result, but the visible thinking process. Math papers usually skip all the messy parts and just present the polished proof. But here you get something closer to their notepad. I also find it oddly endearing when the AI says things like “Interesting!” It almost feels like a researcher encouraging themselves after a small progress. It gives me rare feeling of watching the search itself, not just the final result.

reply

upvote

by bertil1 hours ago|

[-]

> the AI says things like “Interesting!”

My experience of those utterance is that it’s purely phatic mimicry: they lack genuine intuitive surprise, it’s just marking a very odd shift in direction. The problem isn’t the lack of path, is that the rhetorical follow-up to those leaps are usually relevant results, so they stream-of-token ends up rapidly over-playing its own conviction. That’s why it’s necessary (and often ineffective) to tell them to validate their findings thoroughly: too much of their training is “That’s odd” followed by “Eureka!” and not “Nevermind…”

reply

upvote

by sigbottle1 hours ago|

[-]

I think that a lot of models have to sprinkle in a lot of "fluff" in their thinking to stay within the right distribution. They only have language as their only medium; the way we annotate context is via brackets and then training them to hopefully respect the brackets. I'd imagine that either top labs explicitly train, or through the RL process the models implicitly learn, to spam tokens to keep them 'within distribution' since everything's going through the same channel and there's no fine grained separation between things.

Philosophically, it's not like you're a detached observer who simply reasons over all possible hypotheses. Ever get stuck in a dead end and find it hard to dig yourself out? If you were a detached observer, it'd be pretty easy to just switch gears. But it's not (for humans).

reply

upvote

by jackcarter50 minutes ago|

[-]

It’s funny that this is probably due to bias in the training texts, right? Humans are way more likely to publish their “Eureka!” moments than their screwups… if they did, maybe models would’ve exhibit this behavior.

Now that AI labs have all these “Nevermind” texts to train on, maybe it’s getting easier to correct? (Would require some postprocessing to classify the AI outputs as successful or not before training)

reply

upvote

by Forgeties7941 minutes ago|

[-]

My understanding is that it’s the result of these companies making sure to keep you engaged/happy unless the result of data these companies train with.

I don’t know if it’s true or not but it certainly tracks given LLMs are way more polite than the average post on the internet lol

reply

upvote

by 1 hours ago|

[-]

deleted

reply

upvote

by animal53136 minutes ago|

[-]

I've somehow managed to train mine out of trying to fluff me up the whole time, its become very factual.

Overall it saves me a lot of time reading when it's just focusing on the details.

reply

upvote

by epolanski36 minutes ago|

[-]

Interestingly this is strikingly similar to how my mind would process something I find genuinely interesting.

reply

upvote

by rafaelmn1 hours ago|

[-]

This is another underrated benefit of working with LLMs. When I work I don't take detailed notes about my thinking, decisions, context, etc. I just focus on code. If I get interrupted it takes me a while to get back into the flow.

With LLMs I just read back a few turns and I'm back in the loop.

reply

upvote

by notahacker1 hours ago|

[-]

The actual iteration through various learned approaches to dealing with problems I'd probably find fascinating if I understood the maths! Especially if I knew it well enough to know which approaches were conventional and which weren't.

I find the AI pronouncing things "interesting!" less interesting on the basis that even though in this case it crops up in the thinking rather than flattering the user in the chat, it's almost as much of an AI affectation as the emdash.

reply

upvote

by jdmichal1 hours ago|

[-]

I always assumed the "interesting!" markers were actual markers. A kind of tag for the system to annotate its context.

reply

upvote

by notahacker1 hours ago|

[-]

Probably does function like that in terms of highlighting context, in this case probably to the system's benefit.

But in general exclamations of "interesting!" seems like the stereotypical AI default towards being effusive, and we've all seen the chat logs where AI trained to write that way responding with "interesting", "great insight!" towards a user's increasingly dubious inputs is an antipattern...

reply

upvote

by andrepd48 minutes ago|

[-]

The simulacrum of a thing is not the thing! Not only is the "interesting!" unrelated to any "thought process", the whole """thinking""" output is not a representation of a thought process but merely a post-facto confabulation that sounds appropriately human-like.

reply

upvote

by pglevy40 seconds ago|

[-]

Can't help but think of this I re-read recently from Nietzche:

> When I analyze the process that is expressed in the sentence, "I think," I find a whole series of daring assertions that would be difficult, perhaps impossible, to prove; for example, that it is I who think, that there must necessarily be something that thinks, that thinking is an activity and operation on the part of a being who is thought of as a cause, that there is an "ego," and, finally, that it is already determined what is to be designated by thinking—that I know what thinking is.

reply

upvote

by clejack9 minutes ago|

[-]

Yes, I recently got access to an annotations platform for llms, and I've found many projects associated with generating chain of thought outputs.

These COT outputs are the same sort of illusion as the general output. Someone is feeding them scripts of what it looks like to solve problems, so they generate outputs that look like problem solving.

I can't remember if I mentioned it previously on here, but an llm seems to be an extremely powerful synthesis machine. If you give it all of the individual components to solve a complex problem that humans might find intractable due to scope or bias, it may be able to crack the problem.

reply

upvote

by cubefox57 minutes ago|

[-]

[dead]

reply

upvote

by petra1 hours ago|

[-]

I don't haven ChatGPT but Gemini and Claude. But how do you make a language model think for 80 minutes ???

reply

upvote

by zeven752 minutes ago|

[-]

I have Gemini and ChatGPT and keep them on the highest thinking settings. ChatGPT will regularly think 40-60 minutes on the same problem that Gemini will think 10-15 minutes on. The quality of ChatGpt’s response is usually a little higher but not that much higher. My takeaway is Gemini is better at thinking faster, maybe has better more dedicated hardware behind it, and I use Gemini if I want a faster answer but ChatGPT I’d I want to push the quality of the answer a little higher.

reply

upvote

by somewhatgoated1 hours ago|

[-]

It has an “high effort” mode that makes it think really long

reply

upvote

by staticassertion45 minutes ago|

[-]

In my experience, you can tell them "Don't stop working on this until complete" and they'll go for an hour or more.

reply

upvote

by baxtr1 hours ago|

[-]

Give it hard enough problems?

reply

upvote

by chvid4 hours ago|

[-]

I am curious if there is a “harness” for maths out there (like the system prompt and tool collection in Claude code but for maths instead of coding)?

Asking the llm to structure its response in plan and implementation, allowing it to call tools like python, sage, lean etc.

reply

upvote

by brandensilva2 hours ago|

[-]

Also curious about this, it seems like it would be important to guide these tools more specifically based on the domain of expertise.

reply

upvote

by ndriscoll32 minutes ago|

[-]

Why wouldn't you just use coding agents and ensure you have e.g. Lean and Mathlib in the environment?

reply

upvote

by nycdatasci10 hours ago|

[-]

Tried w/ 5.5 Pro, Extended Thinking. 17 minutes:

-----------------------------

Yes. In fact the proposed bound is true, and the constant 1 is sharp.

Let w(a)= 1/alog(a)

I will prove that, uniformly for every primitive A⊂[x,∞), ∑w(a)≤1+O(1/log(x)) , which is stronger than the requested 1+o(1).

https://chatgpt.com/share/69ed8e24-15e8-83ea-96ac-784801e4a6...

reply

upvote

by mrabcx2 hours ago|

[-]

Tried the same prompt in DeepSeek 4

https://chat.deepseek.com/share/nyuz0vvy2unfbb97fv

Comes up with a proof.

reply

upvote

by adamgordonbell1 minutes ago|

[-]

Are these proofs equivalent? Pretty cool if so.

reply

upvote

by cryptoegorophy10 hours ago|

[-]

Mine took 20min. Pro. https://chatgpt.com/share/69ed83b1-3704-8322-bcf2-322aa85d7a... But I wish I was math smart to know if it worked or not.

reply

upvote

by liweic5 hours ago|

[-]

Wired enough, Pro+extended with the same prompt, just output directly without thinking: https://chatgpt.com/s/t_69edd2d9dc048191b1476db92c0dedf8 . Does this mean the result was cached or that it simply routes to a different model silently based on the user?

reply

upvote

by Vachyas4 hours ago|

[-]

The link you provided is for a canvas I think rather than the convo

reply

upvote

by vjerancrnjak8 hours ago|

[-]

Ask it to formalize it in Lean.

reply

upvote

by utopiah8 hours ago|

[-]

If they aren't "smart enough" to know if it work they most likely are also unable to verify if the Lean formalization is indeed the one that matches the problem they were trying to solve.

reply

upvote

by timjver5 hours ago|

[-]

Verifying that every step in a (potentially long) proof is sound can of course be much, much harder than verifying that a definition is correct. That's kind of the whole point.

reply

upvote

by LeCompteSftware5 hours ago|

[-]

That's not what the parent comment meant. They meant checking the Lean-language definitions actually match the mathematical English ones, and that the Lean theorems match the ones in the paper. If that's true then you don't actually need to check the proofs. But you absolutely need to check the definitions, and you can't really do that without sufficient mathematical maturity.

reply

upvote

by smallnamespace4 hours ago|

[-]

Yes, and the child comment’s point is that formalizing the problem is likely easier than having the LLM verify that each step of a long deduction is correct, which is why Lean might be helpful.

reply

upvote

by LeCompteSftware2 hours ago|

[-]

But both of you are ignoring the parent comment! Actually you're ignoring the context of the thread.

Originally someone said "I wish I was math smart to know if [this vibe-mathematics proof] worked or not." They did NOT say "I'd like to check but I am too lazy." Suggesting "ask it to formalize it in Lean" is useless if you're not mathematically mature enough to understand the proof, since that means you're not mathematically mature enough to understand how to formalize the problem.

Then "likely easier" is a moot point. A Lean program you're not knowledgeable enough to sanity-check is precisely as useless as a math proof you're not knowledgeable enough to read.

reply

upvote

by utopiah2 hours ago|

[-]

thanks

reply

upvote

by dbdr8 hours ago|

[-]

That's great if it works. But it's way harder to produce a formal proof. So my expectation is that this will fail for most difficult problems, even when the non-formal proof is correct.

reply

upvote

by DonHopkins7 hours ago|

[-]

Formalize this in the form of a Iranian Lego Trump Dis Rap video.

reply

upvote

by UltraSane15 minutes ago|

[-]

The total flops it consumed during those 80 minutes is crazy.

reply

upvote

by jgalt21221 minutes ago|

[-]

> "Thought for 80m 17s"

Is there any good rule of thumb for how many kWh of electricity this is?

reply

upvote

by sfdlkj3jk342a2 hours ago|

[-]

When using the web interface for ChatGPT like this, is there any way to tell which model is actually being used?

reply

upvote

by DeathArrow4 hours ago|

[-]

>don't search the internet.

I think this was key. Otherwise the LLM could think it can't be done.

reply

upvote

by amelius2 hours ago|

[-]

But it was trained on the internet.

reply

upvote

by embedding-shape4 hours ago|

[-]

"Knowing" (guessing really) what is possible and not is a huge deciding factor in if you can do that thing or not, meaning if you "know" it isn't possible you'll probably never be able to do it, but if you didn't know it wasn't possible, it is possible :)

reply

upvote

by ipaddr11 hours ago|

[-]

Tried the same prompt and ended up no where close on the free plan.

reply

upvote

by jasonfarnon11 hours ago|

[-]

Is there a known lag that it takes the Pro plan's abilities to migrate to the free plans?

reply

upvote

by brianjking11 hours ago|

[-]

GPT 5.5 Pro is not available to any plan outside of ChatGPT Pro ($100 or $200) tier or the API as far as consumer access.

reply

upvote

by jasonfarnon10 hours ago|

[-]

Yes, but don't we expect GPT 5.5 Pro will eventually be a free tier? Maybe I'm missing something because I only use the free tier. But the free tier has gotten way better over the last few years. I'm pretty sure, based on descriptions on this site from paid subscribers, that the free tier now is better than the paid tier of say 2 years ago. That's the lag I'm wondering about.

reply

upvote

by manfromchina19 hours ago|

[-]

Free ChatGPT is like a fast car with a barely responsive steering wheel. Guardrails on that thing are insane. Even for math. It wont let you think. It will try to fix mistakes you havent even made yet based on intent that was ascribed to you for no reason. It veers off in some crazy directions thinking that's what you meant and trying to address even a little bit of that creates almost a combinatorial explosion of even more wrong things. Is why I stick to Claude. The latter is chill and only addresses what you had typed. Isn't verbose and actually asks you what you getting at with your post. That said, ChatGPT is more technical and can easily solve math problems that stump Claude.

reply

upvote

by nextaccountic7 hours ago|

[-]

So this doesn't happen in the paid plans of ChatGPT? But why?

reply

upvote

by virgildotcodes4 hours ago|

[-]

Paid plans give you access to much larger, more intelligent models which have thinking enabled (inference time compute). In the example here you can see GPT Pro taking 20-80 minutes to respond with the proof.

All this is far more expensive to serve so it’s locked away behind paid plans.

reply

upvote

by nextaccountic22 minutes ago|

[-]

> thinking enabled (inference time compute)

What do you mean by compute?

reply

upvote

by vessenes10 hours ago|

[-]

I do not think this is true. You will continue to get smaller, cheaper-to-host models in the free tier that are distilled from current and former frontier models. They will continue to improve, but I’d be very surprised if, e.g., 5.4-mini (I think this is the free tier model) beat o3 on many benchmarks, or real world use cases.

I won’t even leave chatGPT on “Auto” under any circumstances - it’s vastly worse on hallucinations, sycophancy, everything, basically.

Anyway, your needs may be met perfectly fine on the free tier product, but you’re using a very different product than the Pro tier gets.

reply

upvote

by 7 hours ago|

[-]

deleted

reply

upvote

by hyraki10 hours ago|

[-]

You should pay for it if you find value in it.

reply

upvote

by amazingman9 hours ago|

[-]

They pay for it with their personal data.

reply

upvote

by 9 hours ago|

[-]

deleted

reply

upvote

by andai10 hours ago|

[-]

Tangential but I learned today that GPT-5.5 in ChatGPT (Plus) has a smaller context window than the one in the API. (Or at least it thinks it does.)

I'd guess / hope the Pro one has the full context window.

reply

upvote

by refulgentis10 hours ago|

[-]

Notably, 5.5 has a higher price on API for context > ChatGPT, and 5.5 Pro on API does not differentiate based on context size (it’s eye bleeding expensive already :)

reply

upvote

by vessenes10 hours ago|

[-]

Do not use the free plan. It is not good.

reply

upvote

by Someone123411 hours ago|

[-]

Does the free plan even have access to thinking models?

reply

upvote

by jychang11 hours ago|

[-]

Technically yes, gpt-5.4-mini is available on the free plan

reply

upvote

by Matticus_Rex11 hours ago|

[-]

Was this a surprise?

reply

upvote

by ArtIntoNihonjin9 hours ago|

[-]

[dead]

reply

upvote

by 11 hours ago|

[-]

deleted

reply