undefined

points

[-]

I recognize the sarcasm. The data I can find says it's performing at baseline however?

[-]

Yeah, that's my point. Humans are not reliable LLM evaluators. "Secret model nerfs" happen in "vibes" far more often than they do in any reality.

[-]

This but unironically.

"I reject your reality, and substitute my own".

It worked for cheeto in chief, and it worked for Elon, so why not do it in our normal daily lives?