Nowadays, I have often seen LLMs (Opus 4.5) give up on their original ideas and assumptions. Sometimes I tell them what I think the problem is, and they look at it, test it out, and decide I was wrong (and I was).
There are still times where they get stuck on an idea, but they are becoming increasingly rare.
Therefore, think that modern LLMs clearly are already able to question their assumptions and notice when framing is wrong. In fact, they've been invaluable to me in fixing complicated bugs in minutes instead of hours because of how much they tend to question many assumptions and throw out hypotheses. They've helped _me_ question some of my assumptions.
They're inconsistent, but they have been doing this. Even to my surprise.
yet - given an existing codebase (even not huge) they often won't suggest "we need to restructure this part differently to solve this bug". Instead they tend to push forward.
Having realized that, perhaps you are right that we may need a different architecture. Time will tell!
Have you tried actually prompting this? It works.
They can give you lots of creative options about how to redefine a problem space, with potential pros and cons of different approaches, and then you can further prompt to investigate them more deeply, combine aspects, etc.
So many of the higher-level things people assume LLM's can't do, they can. But they don't do them "by default" because when someone asks for the solution to a particular problem, they're trained to by default just solve the problem the way it's presented. But you can just ask it to behave differently and it will.
If you want it to think critically and question all your assumptions, just ask it to. It will. What it can't do is read your mind about what type of response you're looking for. You have to prompt it. And if you want it to be super creative, you have to explicitly guide it in the creative direction you want.
I don't think there's anything you can't do by "predicting the next token really well". It's an extremely powerful and extremely general mechanism. Saying there must be "something beyond that" is a bit like saying physical atoms can't be enough to implement thought and there must be something beyond the physical. It underestimates the nearly unlimited power of the paradigm.
Besides, what is the human brain if not a machine that generates "tokens" that the body propagates through nerves to produce physical actions? What else than a sequence of these tokens would a machine have to produce in response to its environment and memory?
Ah yes, the brain is as simple as predicting the next token, you just cracked what neuroscientists couldn't for years.
Couple that with all the automatic processes in our mind (filled in blanks that we didn't observe, yet will be convinced we did observe them), hormone states that drastically affect our thoughts and actions..
and the result? I'm not a big believer in our uniqueness or level of autonomy as so many think we have.
With that said i am in no way saying LLMs are even close to us, or are even remotely close to the right implementation to be close to us. The level of complexity in our "stack" alone dwarfs LLMs. I'm not even sure LLMs are up to a worms brain yet.
In my experience, if you do present something in the context window that is sparse in the training, there's no depth to it at all, only what you tell it. And, it will always creep towards/revert to the nearest statistically significant answers, with claims of understanding and zero demonstration of that understanding.
And, I'm talking about relatives basic engineering type problems here.
But I may easily be massively underestimating the difficulty. Though in any case I don't think it affects the timelines that much. (personal opinions obviously)