undefined

points

[-]

"personality issues" I was able to tell that Opus 4.7 would take instructions more literally, which I appreciated once I calibrated my phrasing to be more precise (often asking to investigate issues, pre-4.7 it'd start making code changes instead of just giving write up). But I can see contexts where handling vague prompts would've just been worse

by swingboy14 hours ago|

prev|

[-]

Looking forward to the results. Thanks for your work.

by gertlabs11 hours ago|

parent|

[-]

Appreciate that! Results are live: https://gertlabs.com/rankings

Opus 4.8 is the first tangible improvement since Opus 4.5. And it doesn't seem to have the personality problems of the last release -- I've been enjoying using it.

by swingboy4 hours ago|

parent|

[-]

Nice! Looks like it’s topping the two coding ones. I noticed it is absent from the Social Intelligence board though?

by gertlabs1 hours ago|

parent|

[-]

That'll populate over the next couple weeks -- those are the live games on the spectate tab which take a while to generate statistically worthwhile data. I'm curious how it does. From using it all day, I can say Opus 4.8 is my new favorite model, hands down.