Huh, that is very curious and interesting indeed. If that's indeed true, that Anthropic claims that pass rate while OpenAI claims the test cases are flawed and broken, then clearly one of them aren't telling their whole side...
https://news.ycombinator.com/item?id=47911074
Citation for the claimed pass rates is: https://llm-stats.com/benchmarks/swe-bench-verified