undefined

upvote

points

by jaggs15 hours ago |

upvote

by BoorishBears2 hours ago|

[-]

Their benchmark is chock-full of things like that: It's deeply flawed and is essentially rating how LLMs perform if you exert yourself trying to hold them entirely the wrong way.

reply