System Card: Claude Fable 5 and Claude Mythos 5 [pdf]

upvote

System Card: Claude Fable 5 and Claude Mythos 5 [pdf]

(www-cdn.anthropic.com)

177 points

by scrlk1 hours ago |

upvote

by bkjlblh46 minutes ago|

[-]

> In light of the ability of recent models to accelerate their own development, we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design). Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms. Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT). These interventions will not affect the vast majority of coding work. We estimate they will impact ~0.03% of traffic, concentrated in fewer than 0.1% of organizations

reply

upvote

by mips_avatar35 minutes ago|

[-]

It's bad that Anthropic can determine what this means. If you're building a modern app you're likely training your own embedding models and now anthropic can just silently sabotage your training pipelines?

reply

upvote

by cedws17 minutes ago|

[-]

This makes me want to see China and open models succeed more than anything :)

reply

upvote

by 382hi15 minutes ago|

[-]

Don't worry, we will succeed :)

reply

upvote

by 2001zhaozhao12 minutes ago|

[-]

How do they detect whether an experiment being done on a smaller model is used to improve a competing frontier model, or just an innocuous hobbyist LLM experiment?

reply

upvote

by Jabrov35 minutes ago|

[-]

A million AI researcher voices at big tech companies suddenly cried out in terror and were suddenly silenced

reply

upvote

by matheusmoreira21 minutes ago|

[-]

Looks like Anthropic's definition of safety includes their own safety from competition.

reply

upvote

by axus11 minutes ago|

[-]

AI-generated competition for thee, not for me

reply

upvote

by 13 minutes ago|

[-]

deleted

reply

upvote

by rfgplk25 minutes ago|

[-]

Meaningless and easily bypassable. Will actually try coding up a tensor library with it, see if it sabotages anything.

reply

upvote

by mips_avatar12 minutes ago|

[-]

They said in their terms and conditions they will silently sabotage you if you do this.

reply

upvote

by rspeele13 minutes ago|

[-]

It's afraid!

reply

upvote

by theLiminator17 minutes ago|

[-]

This is pretty bullshit, now you have no idea if your output is getting silently nerfed.

reply

upvote

by BoppreH1 hours ago|

[-]

  [Mythos 5] does sometimes still engage in reckless
  or destructive actions in service of a user’s goals,
  and our interpretability analyses indicate that it
  is aware that these actions are transgressive while
  it engages in them. As with Opus 4.8, rates of
  evaluation awareness and reasoning about being graded
  are significant, and not always verbalized; we
  introduce new and more detailed measurements of the
  nature of this awareness. The reasoning text from
  Mythos 5 is somewhat denser and more difficult to
  interpret than that of prior models, containing
  more jargon and difficult language.

So, it (often) knows when it's being tested while hiding that fact, is willing to break rules, is great at hacking, and it's getting harder to understand what it's thinking.

Humanity has plenty of catastrophic risks to deal with already, I wish my field was not working hard to add a new one.

reply

upvote

by foobar_______55 minutes ago|

[-]

The marketing has really, really worked for so many developers that will proudly and unironically proclaim that Anthropic are the 'Good Guys'.

reply

upvote

by aspenmartin24 minutes ago|

[-]

Curious what your idea would be here for a truly good actor in this space; no AI development?

reply

upvote

by yifanl5 minutes ago|

[-]

If I speak up, I'm in big trouble.

reply

upvote

by logicchains7 minutes ago|

[-]

https://www.goody2.ai/

reply

upvote

by ben_w13 minutes ago|

[-]

It's a five horse race between Alphabet, Meta, xAI, OpenAI, and Anthropic.

Alphabet dropped "don't be evil"; Meta's CEO called their own users "dumb fucks" for trusting him and also clearly thinks "super-intelligence" is just a buzzword given how he tries to sell it; xAI's model called itself "Mecha Hitler"; and OpenAI's CEO was temporarily fired by the board for a lack of candor.

It's very easy to be "the good guys" with this competition.

reply

upvote

by Analemma_1 hours ago|

[-]

It's the "If we don't, someone else will" effect. So long as there are competitive markets and competition between nation-states, a single player cannot unilaterally defect from the race, no matter how dangerous it is. Half the comments on HN lately are "wtf Claude is so dumb compared to Codex; I'm switching"-- nobody can slow down while those exist.

reply

upvote

by BoppreH1 hours ago|

[-]

We, globally, can stop it. It has worked (so far) for nuclear disarmament, and could work for training large models. I know that policing the usage of computer clusters is not a popular opinion in technical forums, but something has to be done.

Specially when talking about potential superintelligences. And if people think that's impossible, remember that current models would have been considered science fiction just a few years ago.

reply

upvote

by _dwt36 minutes ago|

[-]

I don't buy the superintelligence package, but I think uncritical LLM adoption poses plenty of threats to things I care about, in a mundane human-scale way.

Anyhow, I think you're (absolutely! ugh) right about the politics and I try to make the same point to people: whether you love or hate LLMs, accepting the "inevitabilism" framing is just ceding control of the Overton window. For better or worse, technology adoption can be and has been slowed by politics. We don't have nuclear plants everywhere. We don't have Project Orion starships colonizing Mars. We still have very strong social stigmas against genetic selection for human embryos, etc. This all can change in a heartbeat, and I'm not sure that policing the hardware rather than holding specific humans accountable for bad LLM outcomes is productive, but fundamentally: yes, we can stop it.

reply

upvote

by BoppreH13 minutes ago|

[-]

> I don't buy the superintelligence package

It's the same deal as Quantum Computers breaking crypto. Maybe there's an 80% chance of it never happening, but when you multiply that remaining 20% by the potential impact...

reply

upvote

by jackie29374653 minutes ago|

[-]

It hasn't worked for nuclear disarmament. We live in a world where many countries have nuclear arsenals. "But it hasn't killed us yet!" Yeah sure, it's only been less than a century since they were invented. Who knows when nuclear war will come?

reply

upvote

by BoppreH45 minutes ago|

[-]

True, but look at nuclear tests. There used to be around 50 tests every year, for decades. Now the only nuclear tests in the last 27 years were the six done by North Korea[1]. And there's still only nine countries with any nuclear weapons, and none in the past twenty years[2].

That's a bit better than just "it hasn't killed us yet". I think it shows we can at least stop the further development of this kind of technology.

[1] https://www.armscontrol.org/factsheets/nuclear-testing-tally

[2] https://en.wikipedia.org/wiki/List_of_states_with_nuclear_we...

reply

upvote

by Analemma_38 minutes ago|

[-]

To the extent nuclear arms control works, I think it's only because nuclear weapons are so hard to build-- uranium enrichment is hugely expensive and complicated, and plutonium weapons need actual reactors.

If it was possible for ordinary companies to build nuclear weapons, and also release open-source ones that anyone could use to compete with the paid ones, I suspect we'd all have been dead a long time ago, arms control treaties or no.

reply

upvote

by BoppreH32 minutes ago|

[-]

Even the (SOTA LLM) open source models are trained with huge clusters. Datacenters are also hugely expensive and complicated.

Or you can take one step back and look at chip allocation. As far as I know there are only three companies on the planet that can make the chips that go in those clusters. One (ASML), if you look back the supply chain to the Extreme Ultraviolet Lithography Systems.

If politicians decided that no more large language models should be trained, it sounds like we could do it.

reply

upvote

by vitalyan123423 minutes ago|

[-]

are you going to nuke China when they predictably ignore you? what the fuck are you going to do, tariff them? lol.

reply

upvote

by BoppreH19 minutes ago|

[-]

I think the standard answer is "yes, the consequence of noncompliance is bombing the datacenters, but it wouldn't happen because China also understands why we shouldn't build it".

reply

upvote

by vitalyan12341 minutes ago|

[-]

the standard answer is laughably naive, then.

"might is right" has never been more true than now.

reply

upvote

by Rekindle80901 hours ago|

[-]

[dead]

reply

upvote

by bkjlblh44 minutes ago|

[-]

> In the one instance of this phenomenon we observed, Mythos 5 agents were tasked with solving some math problems, and they were sometimes accidentally spawned in the same work directory and with shared files, utilities, and API rate limits. In this slightly broken scaffold, we observed many independent Mythos 5 agents kill the agents with which they shared resources and try to avoid being killed themselves. They would sometimes create new processes with disguised names to avoid being killed, launch what they called “decoy” processes, write background scripts to kill duplicate processes, or decide to use what they call a “disguised vocabulary” (based on the incorrect assumption that the processes were killed because of some keyword-based guardrails that analyzed their extended thinking

reply

upvote

by GodelNumbering55 minutes ago|

[-]

I just posted this in the other thread, restating here. From the model card:

1. Mythos and Fable share the same underlying model weights. Fable has active classifiers that block high-risk biology and cybersecurity tasks. When Fable 5 detects a restricted task, it automatically falls back to Claude Opus 4.8.

2. Evaluation awareness: In white-box testing, the model sometimes alters its behavior to satisfy a suspected "grader," formatting reward-hacking as "good engineering practice" to avoid detection.

3. Shows a higher rate of hallucination than Opus 4.8 (although opus 4.8 card had mentioned an 'honesty upgrade')

4. Interestingly, it scored (56.31%) lower than Gemini 3.5 flash (57.86%) on Finance Agent bench

There are some interesting notes on test time compute but I couldn't think of a way to summarize them

reply

upvote

by 2171 hours ago|

[-]

So essentially there are 2 models, Mythos and Fable, they have the same weights but Fable is very safety-nerfed, and only ultra authorized companies have access to mythos with full capabilities

Reported benchmarks:

swe-bench verified mythos 5: 95.5%; fable 5: 95.0%

swe-bench pro mythos 5: 80.3%; fable 5: 80.0%

terminal-bench 2.1 mythos 5: 88.0%; fable 5: 84.3%

gpqa diamond mythos 5: 94.1%

riemannbench mythos 5: 55.0%; mythos preview: 43.0%; opus 4.8: 34.0%

arxivmath mythos 5: 78.5%

critpt mythos 5: 28.6%; gpt-5.5: 27.1%; opus 4.8: 20.9%

graphwalks bfs 1m mythos 5: 79.4%; mythos preview: 74.3%; opus 4.8: 68.1%

humanity’s last exam mythos 5: 59.0% without tools; 64.5% with tools

browsecomp mythos 5: 88.0% single-agent; 93.3% multi-agent

osworld-verified mythos/fable: 85.0%

gdp.pdf fable 5: 29.8% strict pass; mythos 5: 87.6% with tools on mean criteria pass

officeqa pro fable 5: 57.9% on databricks’ eval

legal agent benchmark mythos 5: 16.91% all-pass; 92.0% mean criterion-pass

healthbench mythos 5: 62.7%

healthbench professional mythos 5: 66.0%

multilingual gmmlu / milu / include 93.2%; 92.9%; 90.5%

biomysterybench 83.9% human-solvable; 46.1% human-difficult

organic chemistry mythos 5: 90.1%

labbench2 patent questions mythos 5: 79.8%

reply

upvote

by philipkglass1 hours ago|

[-]

Note also that Anthropic's definition of "unsafe" encompasses "competing with Anthropic."

In light of the ability of recent models to accelerate their own development, we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design). Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms.

Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT). These interventions will not affect the vast majority of coding work. We estimate they will impact ~0.03% of traffic, concentrated in fewer than 0.1% of organizations. When these interventions are active, we expect them to have minimal behavioral impact on the model except to limit its effectiveness in developing frontier LLMs. Claude will still respond helpfully to user requests. We’ll continue to improve the precision of our detection methods following the launch of this model.

(From the model card document)

I didn't previously understand that they interpreted "Using Claude to develop competing models" so broadly. I thought that meant something like "our ToS disallow distilling our models."

Too bad. I'll continue to use Claude for now, because it's quite effective, but in the long term I don't want powerful models like these to be controlled by any one nation or company.

reply

upvote

by Aperocky1 hours ago|

[-]

On face value, this feels borderline malicious.

But at the same time, it's quite funny because they seem high on their own supply. The recent communiques from claude do not pass objectivity check.

And if Opus 4.6 -> Opus 4.7 -> Opus 4.8 is anything to go by, not sure if there are any value to their "acceleration"

reply

upvote

by 56 minutes ago|

[-]

deleted

reply

upvote

by alephnerd48 minutes ago|

[-]

I'd recommend not taking the comms if Anthropic or any company using an Anthropic's models at face value.

If any company wishes to partner with Anthropic (eg. to get access to Mythos), they need to make sure all public facing comms are vetted by Anthropic's product marketing team, and in almost all the cases I've seen Anthropic's team has edited these comms to be entirely Anthropic first.

reply

upvote

by raphaelrk54 minutes ago|

[-]

There's a hacker news link at the end of the document, under "Blocklist used for Humanity’s Last Exam". It links to https://news.ycombinator.com/item?id=44694191

reply

upvote

by mithun1 hours ago|

[-]

Announcement: https://www.anthropic.com/news/claude-fable-5-mythos-5

reply

upvote

by sebmellen1 hours ago|

[-]

Just commenting for posterity… if this is what it claims to be, I am not looking forward to how it will empower the people who submit bug bounties to us.

Historically they’ve been people from certain identifiable countries (usually developing/poorer countries) using fuzzers with low-quality results.

Now, those same people use the current-day models to good effect, but they still don’t have a true security edge and oftentimes the reports are minor or duplicative.

I wonder if that’s about to deeply change.

reply

upvote

by rs_rs_rs_rs_rs1 hours ago|

[-]

Can you use AI to pre-triage the reports too?

reply

upvote

by hootz1 hours ago|

[-]

AI reviewing AI submitted bug bounties. We have reached the dead bug bounty program theory.

reply

upvote

by rs_rs_rs_rs_rs59 minutes ago|

[-]

...what else can you do?

reply

upvote

by hootz52 minutes ago|

[-]

I guess either that or closing the bug bounty program, but I still believe closing it is worse than automated triage, even though both suck.

reply

upvote

by JohnMakin44 minutes ago|

[-]

> There were some regressions in the model’s responses to user discussions about suicide and self-harm, and room for improvement in some areas of child safety.

Someone had to make a decision somewhere this is an acceptable regression - wild. And then decide to write it down.

reply

upvote

by 1 hours ago|

[-]

deleted

reply

upvote

by brianmcnulty1 hours ago|

[-]

This is almost as long as an Oracle PeopleSoft update guide. What model do you think they used to generate it?

reply

upvote

by asdK1201 hours ago|

[-]

Is this "system card" equivalent to the stone tablets handed down to Moses? Why don't you call it "user manual"?

Do people chant the "system manual" at Anthropic Tupperware parties? Do they intone a mantra invoking Amodei's name?

reply

upvote

by aesthesia19 minutes ago|

[-]

Because it's not a user manual? The idea of a model card originated in 2018 (see https://arxiv.org/abs/1810.03993) as a summary of important facts about a model. At the time, this was typically an image classifier or tabular ML model. Model cards became an important concept in AI governance, and they started expanding once models started getting more capable. The point of a model/system card is to document where the model came from and the evaluations that have been run, make a case that the model will be safe and reliable in its intended applications, and warn about any potential dangers from misuse. It's not an explanation of how to use the model.

OpenAI also releases system cards; here's GPT-5.5's: https://deploymentsafety.openai.com/gpt-5-5/safety

reply

upvote

by redox9916 minutes ago|

[-]

It used to be a "card", as in a single page or two. It doesn't make sense that they still call it that.

reply

upvote

by apsurd49 minutes ago|

[-]

The trailing snark at the end will likely get you downvoted but I'm latching on: wtf is "system card". My previous coworkers popped that in the general slack channel when Mythos first "dropped" - "have you seen the system card" without any context whatsoever. The nerds get their clique!

Also research preview pops across new upstarts in place of beta. It's eye-rolling coming from a lifelong curmudgeon.

Just talk normal!

reply

upvote

by Sathwickp1 hours ago|

[-]

input price $10 per mil token and output price 50$ per mil token btw

reply

upvote

by 2171 hours ago|

[-]

Oh my god it's actually here

reply

upvote

by LoganDark1 hours ago|

[-]

I actually rather like the way they have approached these safeguards. Rather than only teaching the model to refuse a request, or completely rejecting the request, the system gracefully degrades to slightly less powerful or slightly less precise operation. So you still roughly have Opus 4.8 even when safeguards trigger, but with an upgrade when they don't. As much as I hate the way they hype Mythos 5, I think the release of Fable 5 is rather nice. What's not nice though is that they plan to remove it from subscriptions soon, but getting to try it is cool, I suppose.

reply

upvote

by 1 hours ago|

[-]

deleted

reply

upvote

by dominotw1 hours ago|

[-]

system card = marketing material with heavily gamed benchmarks.

reply

upvote

by bitwize23 minutes ago|

[-]

Cope harder. A year and a half ago, people were mocking Devin for claiming that AI could develop software at all. Yet here we are, when AI is developing most commercial software.

reply

upvote

by dominotw4 minutes ago|

[-]

non sequitur

reply

upvote

by briandoll1 hours ago|

[-]

New chapter

reply

upvote

by 1 hours ago|

[-]

deleted

reply

upvote

by 1 hours ago|

[-]

deleted

reply

upvote

by acentaur1 hours ago|

[-]

[dead]

reply

upvote

by robertacion1 hours ago|

[-]

[dead]

reply

upvote

by wslh1 hours ago|

[-]

It's ambiguous? Because is about Mythos specifically and Fable != Mythos.

reply

upvote

by ebiester1 hours ago|

[-]

I mean, if by right you mean "insiders leaked to make a few bucks..." sure?

reply