I think it's confusing enough it's a brand harm. I offer no solutions, unfortunately. I guess you could do a little posthoc analysis for plus subscribers on up and determine if they'd benefit from default Thinking mode; that could be done relatively cheaply at low utilization times. But maybe you need this to keep utilization where it's at -- either way, I think it ends up meaning my kids prefer Claude. Which is fine; they wouldn't prefer Haiku if it was the default, but they don't get Haiku, they get Sonnet or Opus.
For this to work well, the instant reply must be truly instant and the button must always be visible and at the same position in the screen (i.e. either at the top or bottom, of the answer, scrolling such that it is also at the top or bottom of the screen), and once the thinking answer is displayed, there should be a small icon button to show the previous instant answer.
(It seems absurd that to consider that there may be no undo button that the machine can push.)
Just noting that I'm not against differentiation in products, but it gets very confusing for users when there's too many options (in the case of the consumer ChatGPT at least this is still more limited than in pre-GPT 5 days). The issue is that there's differentiation at what I pay monthly (free vs plus vs pro) and also at the model layer - which essentially becomes this matrix of different options / limits per model (and we're not even getting into capabilities).
For someone who uses codex as well, there are 5 models there when I use /model (on Plus plan, spark is only available for Pro plan users), limits also tied to my same consumer ChatGPT plan.
I imagine the model differentiation is only going to get worse as well since with more fine tuned use cases, there will be many different models (ie health care answers, etc.) - is it really on the user to figure out what to use? The only saving grace is that it's not as bad as Intel or AMD cpu naming schemes / cloud provider instance naming, but that's a very low bar.
In my case it would be more useful to have a slider of how much I'm willing to wait. For example instant, or think up to 1 minute, or think up to 15 minutes.
For coding I love codex-5.3-xhigh, but for non-coding prompts I still far prefer o3 even if it's considered a legacy model.
I can imagine that its higher tool use is too expensive to serve, but as a pro user I would love it to come back.
I've long suspected as much, but I always found the API model name <-> ChatGPT UI selector <-> actual model used correspondence very confusing, and whether I was actually switching models or just some parameters of the harness/model invocation.
> One series is the Instant series, which is faster and more tuned to ChatGPT, but less accurate.
That's putting it mildly. In my experience, the "instant/chat" model is absolute slop tier, while the "thinking" one is genuinely useful and also has a much more palatable tone (even for things not really requiring a lot of thought).
Fortunately, the latter clearly identifies itself with an absurd amout of emoji reminiscent of other early chatbots that shall not be named, so I know how to detect and avoid it.
hide away the extra complexity for everyone. give power users a way to get it back.
VIM does this well: no UI, magic incantations to use features.