undefined

points

[-]

What else can people do? Try the dozen of commercial offerings themselves? Okay I suppose that's doable, you task one engineer to try them one by one for one month. But then the next model drops and you start all over again...

But then what about local models? You have hundreds of variations to test yourself. It's simply not doable unless it's your full time hobby.

You need benchmarks to at least separate the cream from the crop, so you're left with only a few choices to test yourself.

by subulaz16 hours ago|

prev|

[-]

a LOT of the people who love benchmarks are middle management hard-selling GenAI/LLM as magic tech sauce to vaguely technical executives who only want to know about the money aka headcount savings they so desperately desire.

their collective butts are already glued to the hype train as they chase numbers they (often) manufactured to justify the latest round of tech spend.

lots of good use cases out there - like the incredible progress with medical imaging analysis or complex system models for construction - and lots of crap use cases that need benchmarks to cosplay relevance.

by operatingthetan16 hours ago|

prev|

[-]

We need good benchmarks or we are just left following the hype train.