Anyway, it seems like they erred in the up-front claim "small models found the vulnerability we pointed directly at!", but the findings are at least somewhat stronger if you read through the details.
The small models didn't match Mythos at exploitation. They suggested plausible exploits, but didn't actually try them out so I can't tell if they would have worked. Deepseek R1's sounds pretty convincing to me, but I'm not a good judge. (I'm more in the space of accidentally writing vulnerabilities, not seeking them out or exploiting them. Well, ok, I have a static analysis that finds some, at least.)