upvote
Hi, author here, I cannot give an exact number for how many token the verification step took, but the verification GLM 5.2 ran was very stupid and definitely a waste of time. It read the pixel color data to try and verify the scene rendered properly. Which is really bad. Opus opened the game in a Playwright browser and took screenshots to verify the actual image. Which helped a lot.

Pro tip: You could use a multi-modal model to verify images as a subagent spawned by GLM 5.2, to get around this issue.

reply
That's a dumb way to do it, it should just write the frame buffer to a PNG instead of taking screenshots. I guess you can't take the dumb web developer ways out of these models at the end of the day.
reply
I could be wrong but I believe this is a non-vision model. Please weigh in to correct me bc I would love to be wrong
reply
GLM 5.2 is text only, not multi modal. And Opus is multi modal.
reply