undefined

upvote

points

by MattSayar14 hours ago |

upvote

by ACCount3714 hours ago|

[-]

Yeah, that's my point. Humans are not reliable LLM evaluators. "Secret model nerfs" happen in "vibes" far more often than they do in any reality.

reply