upvote
> Who are making these claims? script kiddies? sr devs? Altman?

AI agents, perhaps? :-D

reply
> All anonymous as well. Who are making these claims? script kiddies? sr devs? Altman?

You can take off your tinfoil hat. The same models can perform differently depending on the programming language, frameworks and libraries employed, and even project. Also, context does matter, and a model's output greatly varies depending on your prompt history.

reply
It's hardly tinfoil to understand that companies riding a multi-trillion dollar funding wave would spend a few pennies astroturfing their shit on hn. Or overfit to benchmarks that people take as objective measurements.
reply
When you keep his ramblings on twitter or company blog in mind I bet he is a shit poster here.
reply