undefined

points

by xlii11 hours ago |

comments

by LaurensBER11 hours ago|

[-]

GLM 5.2 is great but it heavily detoriates once the context window gets past 200k tokens.

I've had more success with creating a plan first and then implementing it in (short-lived) sub-agents.

Ironically good software architecture patterns (small functions, single responsibility) heavily impact the performance of these models as well. They do surprisingly well in well architectured codebases.

They do very poorly in anything that's a mess where Opus and GPT 5.5 still get reasonable performance.

by oshrimpton11 hours ago|

prev|

[-]

Yeah the benchmark for sure isn't perfect and without super rigid prompting it is far too easy for it to get off course. 28% hallucination rate isn't nothing either