undefined

points

by jameswhitford12 hours ago |

comments

by wongarsu11 hours ago|

[-]

Tbf, most of the "real benchmarks" have issues that are just as bad. Assessing LLM performance is just hard

by oceansky6 hours ago|

parent|

[-]

And personal too. Different engineers are using them for different use cases.

by ramraj072 hours ago|

prev|

[-]

The important point is that your benchmark is pretty much irrelevant for the actual usage. Thus whatever conclusion you draw is not just irrelevant but misleading.

by meander_water11 hours ago|

prev|

[-]

Thanks, I didn't mean to be brusque, but I have seen a lot of these vibe tests lately that come to grand conclusions like "X model is better than Y" from the result of a single prompt.

Appreciate you sharing the results of your tests though!

by jameswhitford10 hours ago|

parent|

[-]

I appreciate the feedback!