undefined

points

[-]

Calling it sota might be a bit provocative, but what actually is the "state of the art"? We have benchmarks, but those are getting increasingly gamed and don't necessarily reflect the actual performance of a model, see Opus 4.7. So I think it's useful to have real world data from actual users as an additional data point.

by miyoji1 hours ago|

parent|

[-]

Maybe you shouldn't be relying on something if you can't even tell how good it is?

by mellosouls15 hours ago|

prev|

[-]

That's pretty much exactly what the title says.

The technical abilities and usage are derived from the commenters usage reflections.

by swyx4 hours ago|

prev|

[-]

and assuming all mentions are coding model mentions just because its on hn