Maybe it is the real deal, but in a world of overpromising and underdelivering, I prefer to be skeptical.
First with more generic prompts, to determine whether it is worthwhile to do a detailed analysis of that file, then with more specific prompts to identify the bugs, and eventually with a prompt that requests a confirmation that a given bug/vulnerability exists.
For a proper comparison between some other model and Mythos, you also need such a complex harness. If you just tell to an LLM "find the bugs", and it does not find a vulnerability known to have been found by Mythos, that is a totally invalid comparison.
The final results provided by Mythos, like a PoC exploit or a patch, are also generated with a prompt that points to the exact code that has the vulnerability (which is supposed to exist based on the results of the previous runs).
every model since gpt3 was claimed to be "too dangerous to release." it's too EXPENSIVE to release, and you're probably a local model with <10B parameters yourself