undefined

points

[-]

Not sure I follow. Anthropic included benchmarks where GPT 5.5 outperforms Claude 4.8. Sure maybe that is a strategic error, but that doesn't seems to indicate benchmarks can't be trusted (I personally don't trust them, but not because of this).

by aspenmartin21 hours ago|

prev|

[-]

Sorry how does their addition of GPT 5.5 in their blog post invalidate benchmarks? Also whether or not the marketing department decided to put it in a table benchmarks are an easy thing to measure independently