This AISLE benchmark is interesting in this matter: https://aisle.com/blog/ai-cybersecurity-after-mythos-the-jag...
And the recently discovered Copy Fail by Xint code is another proof that the gating is overblown: https://xint.io/blog/copy-fail-linux-distributions
I'm not entirely up to date on each week's LLM hype train/scandal but last I heard there was no public access to it or public-trusted 3rd parties that can review model's capabilities
What would be really interesting is a side by side Claude Opus 4.7 and Mythos comparison.