Also, and I know you may not want to answer. But could you give me an idea of the type of thing you found glm to be worse with?
I think I've been fairly unbiased in testing a bunch of different development tasks. But am curious if maybe it performs well for some stuff and not others. So if you could share what you feel it's worse at.
Also are you an experienced developer or less experience?
When DeepSeek V4 Pro came out, I had been mostly coding with GLM-5.1 on a Z.ai coding plan.
I had a large analysis task on a relatively complex codebase. I decided to try the models out.
GLM-5.1 did acceptably but got a few things wrong (easily corrected) and took quite a while to get there.
Opus 4.6 burnt through the US$10 budget I had given it in about 10-15 min, without ever returning from the first prompt.
DeepSeek V4 returned a full analysis within 2-3 min, and I carried on all the way to implementing the feature I was after. Total cost less than US$1.00.
I now mostly alternate between GLM-5.1 and DeepSeek V4 Flash, with an occasional dip into V4 Pro for more complex analyses.
right now everyone is using latest and greatest to do dumb stuff like that. that would change fast if companies start caring about costs.