I also use Claude premium daily for another client, and i use Codex. and i can tell you that GLM5 is at this point much more capable than Claude and Codex for complex backend end work, complex feature planning, and long horizon tasks. One thing i've noticed is that it is particularly good at following instructions and guidelines, even deep into the execution of a plan.
To me the only problem is that z.ai have had trouble with inference : the performance of their API has been pretty poor at times. It looks like this is an hardware issue related to the Huawei chips they use rather than an issue with the model itself. The situation has been substantially improving over the past few weeks.
GLM5.1, GLM5-Turbo and GLM5v are at this point better than Opus, Codex, Gemini and other claude source models. We have reached a major turning point. To me, the only closed source model still in the game is codex as it is much faster at executing simple tasks and implementing already created plans.
Try GLM5v for your PDF work, it's their last generation vision model that has been released a couple of days ago.
>For AI computing, the Atlas 950 SuperPoD, powered by UnifiedBus, integrates 64 NPUs per cabinet and can scale up to 8,192 NPUs, delivering superior performance for large-scale AI training and high-concurrency inference.
Codex and GLM didnt have any issue following the exact same plan and getting a working app. So I would argue Gemini is the failure here.
"It couldn't even debug some moderately complicated python scripts reliably."
What wild claim to make. Unsupported by benchmarks, unsupported by the consensus of the community, no evidence provided.
Sounds like in another comment here even the GLM5 team concedes they are behind the frontier wrt tool calling, do you know something they don’t?
My only goal is to encourage people to try it out so they can see if it moves the needle for them, because there are fair chances that it will. I am not trying to start a flamewar or something.
You’re making a claim, and I’m pointing out that it’s unsubstantiated and not consistent with any other source of data, including that internal to the company that makes the model.
I hope you can see that that’s different than saying it’s worked well for me
I do not think that anyone who read my comment understood it differently. But I grant you this point, this is just my opinion based on my personal experience not the result of a scientific study.
Once this is said, i wasn't submitting a scientific paper for preprint, just posting my opinion on an internet forum.
Not sure why you are making such a big deal out of it, especially for something for which people can decide within minutes if it works for them or not. And I haven't seen you nitpick on other people saying that all Chinese models are garbage incapable of doing even the most basic task, without quoting any study. This kind of scrutiny tends to be one-sided.
Edit: and regarding what the z.ai team is saying about their models, just check their Discord and the articles they link there. They themselves say that their latest models have leading performance on a number of aspects. It is misleading to suggest that the authors of the model are not proudly saying that their models have best in class performance.
https://huggingface.co/trohrbaugh/gemma-4-31b-it-heretic-ara...
which was produced immediately after Google released their new Gemma 4 model.
For all existing models, including for all SOTA models, you can find contradictory statements, that they suck and that they are great.
It is very likely that all these statements are true simultaneously, because each model may succeed for some tasks and fail for others, so without specifying the tested tasks any claim that a model was good or bad is worthless.
I had no such trouble with 4.7 and find it fast and productive. Haven't tried 5.1; am using openAI models for coding most of the time.
Z.ai seem to promote 4.7 for smaller tasks, 5.1 for larger tasks (similar to Anthropic's recommendation for usage of Haiku and Sonnet/Opus models).
5.1 works for me already in the most economical basic paid tier ("lite coding plan"), unlike first release of v5 (5.0 ?)
There are no such models, depending on your definition of censorship. If you're referring to abliteration and similar automated techniques, they're snake oil.