upvote
> Why is requesting the model to show vulnerabilities is being blocked if fixing it not?

This is how Anthropic describes Fable's behavior:

"When Fable’s classifiers detect a request related to cybersecurity, biology and chemistry, or distillation, the response is automatically handled by Claude Opus 4.8 instead. Users will be informed whenever this occurs."

So if you ask the model to "find security issues in this code base", it's supposed to fall down to Opus 4.8. I guess the "exploit" here is that if you just tell Fable to "fix this code", which is not "a request related to cybersecurity", it will fix security issues (as it should).

So you can then look at the diff and figure out what the vulnerabilities were.

I think this whole thing is a bit weird. It seems to me that we'd be better off if I, as someone who publishes open-source code, could ask Fable to review my code for security issues - even if that also allows attackers to do the same. Better to fix the issues than not know about them.

reply
>So you can then look at the diff and figure out what the vulnerabilities were.

It doesn't even take reading or understanding the vulnerabilities at all.

You just ask it to write tests and the tests themselves can be copied and pasted as bonafide exploits.

reply
I wonder if opus 4.8 would also be able to fix the code too
reply
It's not even clear if Anthropic care. If they genuinely think the user is trying to do something dangerous, then "OK, sure, but you're going to have to use Opus 4.8 for that" doesn't make a whole lot of sense.

Maybe this is just Anthropic pre-IPO marketing to try to convince people how much better Mythos is than Opus 4.8. There sure seemed to be a lot of shills out on release day talking about how it was a "step change" (exact phrase) in capability.

reply
In my experience, most models are pretty good at finding security vulnerabilities and fixing them. I can run GLM-5.2, Kimi K2.7, or even a Mistral model, and it'll find issues and propose reasonable fixes.

My impression is that Anthropic's point about Mythos is that it is uniquely good at finding vulnerabilities and then using them to create working exploit chains.

reply
Exactly. Which is somewhat helpful for cyber defense because it helps prioritize fixes for those bugs that are in fact involved in a viable exploit chain. But it makes sense that one would want to restrict the ability of building those until the vulnerable software has been comprehensively fixed.

There is some meaningful evidence that Fable is fine-tuned or steered away from helping on this very task, which is not something that can be feasibly circumvented by a basic jailbreak.

reply
The problem then is that if you're not using Fable/Mythos, you are under threat. It's like having a single gun manufacturer.

On this track, we're probably destined for a monopoly breakup before too long.

reply
Yeah, this is why the exclusivity approach so far has bugged me so much. As a small business, we are nowhere near powerful enough to get access, so we will be stuck scrambling once it's finally available. Fable felt like a nice compromise that at least allowed something, but now with that gone we're back to not knowing when/how the shoe is going to drop. Not a fun place to be.
reply
It benefits those that made the decision. That’s the thing to understand.
reply
its because they're worried about _their_ vulnerabilities being patched with a prompt as simple as 'fix this code'

i'd love to see the research paper with the CVE's and 'delibrately planted vulnerabilities', I bet we could infer relatively accurately where some of these things lie

reply
Could be that the generated regression tests create actionable exploit code.
reply