upvote
You've almost buffer overrun Goodhart's Law into the https://en.wikipedia.org/wiki/McNamara_fallacy . :]
reply
SWE-bench verified was created in collaboration with OpenAI. It's also an open dataset so prone to contamination, meaning it can be gamed.
reply