But then what about local models? You have hundreds of variations to test yourself. It's simply not doable unless it's your full time hobby.
You need benchmarks to at least separate the cream from the crop, so you're left with only a few choices to test yourself.
their collective butts are already glued to the hype train as they chase numbers they (often) manufactured to justify the latest round of tech spend.
lots of good use cases out there - like the incredible progress with medical imaging analysis or complex system models for construction - and lots of crap use cases that need benchmarks to cosplay relevance.