apparently Chinese language as token is more information dense than English, so having these wasteful thinkslop in Mandarin isnt that damaging. So the developer focus mostly in Mandarin and didnt think of handling these thinkslop while American AI labs do.
Being willing and able to reconsider seems very good. Going around and around, pulling in more thinking, integrating it: maybe that's why it is as good as it's good.
I want to emphasize again how excellent it is that we can see the thinking. I think this makes GLM so much better an experience for me. It gives me such insight into what is being considered, helps me see where things go wrong. It grounds me, gives me the notion of where the results come from. It was so jarring to switch to GPT and Opus and find that they won't discuss with me, won't reveal their thinking: that feels fundamentally unsafe, for me, for society, to have such a severe black box. I don't think it should be allowed, honestly.
Many thanks to this recent submission, which is the first time I've seen anyone blog about this core difference: The text in Claude Code’s “Extended Thinking” output is not authentic. https://patrickmccanna.net/the-text-in-claude-codes-extended... https://news.ycombinator.com/item?id=48630535
I gave it some simple code porting exercises and watched dumbfounded at the reasoning, which was more like the ravings of a lunatic - but lo and behold, after much confusion and a dizzying number of eureka moments the task was completed very successfully.
I tried Kimi on a similar task, much faster, a little more reassuring somehow in its ramblings, also surprisingly good results.
To be clear, I’m not surprised the results were good because they’re not GPT or Claude, but because the line of reasoning was so bonkers. Coming from Claude, I was just not used to seeing this, but I’ll bet it’s just as nuts with the frontier models and we’re just not allowed to see it (I’m about to read the links you shared).
Agree wholeheartedly that transparency is of grave importance.
Consider debugging - you start off in one place, think you have worked out what is happening, and then there is a "oh but what about xxx" thing that happens and you explore another branch. Then you "have it for sure" until you find another edge case.
The LLM is doing something analogous. It's writing circuits to try to emulate your program. Each time it gets one that seems right it is very sure that circuit is correct, but then it finds another thing.
At any point you can stop and go "write code now" and it will, and the code will seems fine provided it hasn't hit one of these edge cases.
Turning up thinking time is literally forcing more exploration.
The words that come out are amusingly dramatic, but... TBH when I debug I often are like "WTF" and throwing my hands up in the air at some gotcha I didn't expect.
Now I see the issue clearly! But wait... now I have the full picture! But wait... Found it!
I gave up a few times because of it at first until I realized I just had to let GLM get on with it and what came out was great!
But once it was outright endearing- challenging bug, it said: I have been very thorough. Then it escalated where to look and aced it. Built in confucian values
I started noticing those in gh copilot right around when they turned off thinking traces end of last year