upvote
That’s true. it also depends heavily on the type of task, not everything is equally represented on the web today and it remains to be seen if this is going to change or not.
reply
Or hidden benchmarks, though it's then harder to get people to trust the results.
reply
How do you hide them if you aren't self hosting the model?
reply
The trust issue might be solved by having standardisation bodies created, similar to W3C or even TPC, although TPC didn’t end that well.
reply