upvote
Claude Opus 4.6 and 4.8 find vulns in source code just fine and 4.6 will pentest without source for you given a proper harness WITHOUT jailbreaking. WITH jailbreaks, you can probably imagine what they are capable of.

Anthropic guardrails seem to be more about protecting their business (distillation), than they are about public safety.

reply
public safety is downstream of distillation. If you can distill claude, then no amount of guardrails on claude will protect you from what someone can do with it.
reply
Distillation is not a thing unless you actually have the model weights. What people misleadingly call distillation is just training on chat logs, which has always been routine practice in the industry. There's a reason why every model today talks like early releases of ChatGPT.
reply
I asked it to analyse my architecture and find any security issues and it did it perfectly, first identified the issues & then fixed them. Not sure why my prompt managed to get through the guardrails
reply
I asked Fable to plan a security & performance audit of my website. It said it would check SSR & origin attack surface, CMS content injection, Strapi API surface, etc.

Just before asking for approval to run, it said one thing it wanted to "flag before running" was "Rate-limit and auth testing against prod will generate some 4xx noise in Railway logs and could trip the form rate limiter — harmless, but saying it now."

Ok fine, I said go for it, and it says:

"Running it. Quick recon first (prod URLs + the prior-findings baseline), then I'll fan out the audit tracks with adversarial verification."

Immediately after, I got the Fable warning about how it can't continue because of safety concerns, switching to Opus. In the end, Opus did a good job thanks to whatever Fable suggested doing. Things were fixed that Opus missed in a security/performance audit just the week prior. But what surprised me is that it used 55 agents. Burned 80% of my 5-hour window in 15 minutes (5x Max plan). I've never had Opus do that before on these audits.

reply
exactly for cybersecurity the failure was visible. It was not visible for "Frontier" ML Research. The argument of headstart in it security is no feasible here.
reply
I wonder who gets to decide which companies make important and critical software and which ones get the scraps later.
reply
No need to wonder.

The answer is, the organization making the powerful tool. The people in charge of Anthropic.

Not only that, but they've also written at length about exactly what their opinions and values are: https://darioamodei.com/

You may not agree with the decisions that they make, but they're hardly mysterious. Not something to wonder about.

reply
That would be Anthropic.
reply
Well, Anthropic thinks it should be the Trump administration [1].

This whole business just keeps getting dumber.

1: https://darioamodei.com/post/policy-on-the-ai-exponential

reply
Read the actual essay. I cannot possibly imagine how you come to that conclusion unless you're just arguing in bad faith.
reply
No. You read the actual essay, then explain how we're supposed to interpret this more charitably:

    Frontier AI models, like airplanes, should 
    be required to go through technical testing 
    and auditing, and their release should be 
    blocked or reversed as a threat to public 
    safety if they do not meet high standards 
    of safety. I am grateful to see the Trump 
    administration’s Executive Order move 
    incrementally towards a greater role for 
    government in AI, though Anthropic’s proposal 
    recommends even further action. 
They are all-but-literally sucking up to the administration that declared their company a supply-chain risk, arguing that the same administration should be given gatekeeping authority over all high-quality LLMs including open-weight releases. Go gaslight somebody else.
reply
This is a pretty reasonable statement and I'm not sure how you could interpret this as "sucking up to the admin."
reply
It's a pretty reasonable statement if you work for Anthropic and are eyeing your stock options nervously and your competitors even more so.
reply
Everyone that isn't a bitter cynic must be a shill.
reply