Hold on, I think this claim needs some hard data. Here you go gentlemen:
https://www.aisi.gov.uk/blog/our-evaluation-of-openais-gpt-5...
There might be a harness difference, but also, this CTF-type benchmark might not capture the capability difference fully.