upvote
GLM 5.2 is great but it heavily detoriates once the context window gets past 200k tokens.

I've had more success with creating a plan first and then implementing it in (short-lived) sub-agents.

Ironically good software architecture patterns (small functions, single responsibility) heavily impact the performance of these models as well. They do surprisingly well in well architectured codebases.

They do very poorly in anything that's a mess where Opus and GPT 5.5 still get reasonable performance.

reply
Yeah the benchmark for sure isn't perfect and without super rigid prompting it is far too easy for it to get off course. 28% hallucination rate isn't nothing either
reply