Anthropic guardrails seem to be more about protecting their business (distillation), than they are about public safety.
Just before asking for approval to run, it said one thing it wanted to "flag before running" was "Rate-limit and auth testing against prod will generate some 4xx noise in Railway logs and could trip the form rate limiter — harmless, but saying it now."
Ok fine, I said go for it, and it says:
"Running it. Quick recon first (prod URLs + the prior-findings baseline), then I'll fan out the audit tracks with adversarial verification."
Immediately after, I got the Fable warning about how it can't continue because of safety concerns, switching to Opus. In the end, Opus did a good job thanks to whatever Fable suggested doing. Things were fixed that Opus missed in a security/performance audit just the week prior. But what surprised me is that it used 55 agents. Burned 80% of my 5-hour window in 15 minutes (5x Max plan). I've never had Opus do that before on these audits.
The answer is, the organization making the powerful tool. The people in charge of Anthropic.
Not only that, but they've also written at length about exactly what their opinions and values are: https://darioamodei.com/
You may not agree with the decisions that they make, but they're hardly mysterious. Not something to wonder about.
This whole business just keeps getting dumber.
1: https://darioamodei.com/post/policy-on-the-ai-exponential
Frontier AI models, like airplanes, should
be required to go through technical testing
and auditing, and their release should be
blocked or reversed as a threat to public
safety if they do not meet high standards
of safety. I am grateful to see the Trump
administration’s Executive Order move
incrementally towards a greater role for
government in AI, though Anthropic’s proposal
recommends even further action.
They are all-but-literally sucking up to the administration that declared their company a supply-chain risk, arguing that the same administration should be given gatekeeping authority over all high-quality LLMs including open-weight releases. Go gaslight somebody else.