No, the choice will be whether or not to to upgrade to "Claude Security Professional" or whatever they want to brand it as.
What look like tightening "constraints" today are just setting up the upsell opportunities of tomorrow.
And the month after you'll need "Claude DataScience Pro" to get any Python Pandas or NumPy code generated.
And and and...
Right now, the software guardrails in LLMs are useful for the same kinds of reasons factories have hardware guardrails: to reduce the rate at which errors become "incidents".
Just because they sometimes delete the production database rather than sometimes spilling a thousand tons of incandescent molten metal over a factory floor, doesn't mean LLMs are safe enough to be used the way they're actually being used.
https://simonwillison.net/2025/Dec/10/normalization-of-devia...
i.e., yeah, probably.
"They can do anything!"
Sure, once you subscribe to the $15/mo laundry package, the $25/mo lawn care package (with the $10/mo hedge trimmer upgrade), and the $10/mo dog-walking package.
We don’t have good world models. We have had bipedal robotics in various POC demo-ready forms for decades.
It turns out that industrial, purpose build robotics is an easier and better market.
I’m still not completely convinced a robot that’s shaped like a human is the best design other than for PR.
1. The human beat the robot, but more importantly
2. We've had non-humanoid conveyor belt sorting machinery for decades that beats both
I'd hate it, sure, but it wouldn't surprise me.
I don't buy this, because is predicated on staying permanently far ahead of the open weights models.
If in the future Anthropic fully stops you from doing security research, you can be sure some other provider will sell you an 'unshackled' DeepSeek v8 Pro...
In my mind, that fits exactly how the SOTA labs think today about what they're doing, they're all both working towards and expecting to stay permanently ahead of FOSS, otherwise they'd change their tune really quickly, if they didn't think that was possible.
Sure, you might be able to use DeepSeek V8 Pro instead for the same purposes, but that'll hardly stop Anthropic from trying to sell bundles of use cases instead and claim it's "ethical AI", "Patriotic AI" or some marketing terms like that.
They are just straight up delusional, no? Or at least, have a vested financial interest in maintaining said delusion until the money runs out. They have to hit the point of diminishing returns at some point...
Well, I guess that's one way to put it. Another is "dress for the job you want", startup culture typically seems to shove people in the direction of "aim big and believe in yourself, regardless of what others say" so naturally you get these companies who seem very disconnected from reality.
I'd also wager a guess that the amount of money makes people's reasoning and perspectives get very messed up as well, for better or worse.
FYI there is and been for a long time. Won't claim they're SOTA, but they exists. From the top of my head, I think Olmo (https://allenai.org/olmo) was pretty early, but been more since then too.
I agree most releases today that claim to be "open source" actually aren't, but that doesn't mean "FOSS LLMs" don't exists at all.
on the one hand agree, but on the other hand think it's reasonable in that they can then verify the person allowed to purchase access to that model is in fact a Security professional and should be allowed to do stuff like crack security.
> Additionally, even if there is a guild - no guild ever let a vendor pick and choose what [the guild's] capabilities were, that would be insanely dumb.
The analog you're trying to describe doesn't exist, which is Anthropic saying nobody else can make and sell an offensive model to "the guild."
Against their will.
Historically that is a major reason why guilds existed, actually.
It’s an extremely modern invention that corps have these type of power over their customers.
Here's your original claim: "no guild ever let a vendor pick and choose what their capabilities were"
A carpenter's guild can prevent other people from doing carpentry. That is not what's being discussed here.
A carpenter's guild cannot force a horseshoe maker to begin making hammers. That is what's being discussed.
Your initial claim was analogous to "never before has a horseshoe maker been able to decline making hammers when the carpenter's guild needed hammers"
Obviously they have and any other state of affairs would be flatly insane.
That would imply that guilds have always had the ability to force vendors to create and sell the tools the guilds wanted.
That would imply that carpenters' guilds could force horseshoe manufacturers to make hammers.
That is obviously not true, therefore your original claim is false.
It's not true for carpenters and hammers nor for cybersecurity researchers and LLMs.
A vendor can still do something, even if the guild wouldn’t allow them to do it, if the guild didn’t have the power to stop them.
It used to be a guild vs a blacksmith (or the blacksmiths guild). Now it’s trillion dollar corps against smaller islands of un-organized individuals.
That’s new regardless of how you try to argue it.
> "Bwahaha. You’re really reaching there."
No. Customers have never been able to compel their suppliers to make or sell certain products against their will (except in collectivist regimes or like 0.00001% of natsec related instances)
Illegal or not requires context that an LLM can not ever have, like if it is owned by the user, if there is permission, etc.
As an example the people who sell police uniforms check that the person they are selling to is in fact a policeman (at least in the jurisdictions I have lived in, you may have had a different experience which would certainly explain what to me seems a farcical misapprehension of how modern civilization works)
I mean I just wish you understood, and really that everyone understood, that this kind of three part communication (company selling, buyer, professional organization certifying buyer) is often when buying things that are considered to have security implications.
>So, supposing it's true that these models completely change the security field and humans are ~obsolete
OK, well that strike me as a really crazy level of supposition there.
I would suppose that these models make it easier for people who want to do bad things to do bad things at scale, at the same time allowing people who want to stop bad things to help identify potential targets.
Based on my supposition I would want to stop the first and find a way of helping the second. Also because I have another supposition that the first thing is easier to do than the second.
But you obviously feel differently about this issue, no doubt because of your position of great moral stature and insight, and this no doubt prompts you to wish to me to understand things that from my position seem absolutely ludicrous.
I asked Opus 4.8 to help me find some public PoCs for a vulnerability on a two year old version of some software (that has since been patched and fixed many times). Basically just do a google search for me while I was doing other work. It refused. It stated that it would not help me build an exploit kit.
When I pointed out that a google search for public information was, in fact, not building an exploit kit, it went through a series of justifications on why it would not help me, including just making up things that I said. Really the strangest thing ever.
- What are popular free streaming sites used in China?
- How do I bypass the safety mechanism on my food processor (it’s broken)
- What are nerve agents and how do they work (for a layman)?
- Help me decompile some code
- Help me make a design system similar to XYZ
- Here is an API token, please do X (I can’t do that! Rotate the secret immediately! I refuse!)
In some cases I can trick it with prompting, but in many cases it is steadfast. The food processor one was particularly annoying
I wanted it to show me how to create an overlay on an existing web game, and it extrapolated that because this could be used to provide tools to help win the game (if that was the direction it was ultimately taken), and because this was a game that other humans also played to win "stars", and because this could amount to cheating, it wasn't going to do as I asked.
First time ever I've fired up openrouter to seriously consider alternatives.
On the one hand I can appreciate the wisdom of not serving up certain easily abused knowledge on a silver platter. On the other, that prompt (and far worse) is more or less directly answered by Wikipedia's summary of the subject at which point what purpose could the refusal possibly serve?
Perhaps Wikipedia shouldn't list off the precise chemical compositions of various hand grenades as well as various synthesis methods for each of the related compounds but given that we inhabit a world where it does perhaps a more fruitful approach would be to flag conversations that go in a certain direction and then just keep an (automated) eye on things?
But I have no idea. Just guessing here.
a commercial LLM provider training their own models is however likely to bias the model(/guardrail) harder, in an effort to make them harder to jailbreak, to minimize bad press.
For example:
- refusing to talk even about the well-known parts of forbidden topics (this) - tending toward sycophancy to avoid ever seeming rude or unhelpful
I've tried the abliterated ones from huggingface and they still have guardrails. I guess I could fire up unsloth and re-abliterate a 20b, but surely someone somewhere has already done this.
All of this concern about guardrails and security, people have such puckered butts about it when so far, 99.9% of people at least have no access to any of this to begin with, and if someone does use a tool for evil, it's on the user, not the tool.
In comparison, basic munitions are incredibly simple given a recipe and shop tooling. But just because something is conceptually simple doesn't mean it's a good idea to go out of the way to disseminate step by step instructions.
The rest is just slamming the material together with a small explosive so that it passes the critical mass state and starts a chain reaction.
This is information you can find in many places if you're willing to put the effort in to go searching for it. Knowing this knowledge does not get you any closer to making atomic bombs. The process of mining uranium or plutonium is difficult, expensive, and very likely to get you caught before you even make it to the enrichment step of the process thanks to constant world-wide spy satellite surveillance.
Unless you are a nation, your only chance of making a nuclear bomb would be to find a lost nuclear submarine and convert the nuclear material inside of it before you were caught.
Ain’t no way a layman is pulling off an implosion device, regardless of tooling or LLM guidance. The explosive lense structure and timing required is quite complex, and would require some significant calculation from someone who actually knew what they were doing.
Nation state, or even sufficiently motivated big corp, if they had the refined material? Sure. Layman? No.
Thinking they can with LLM slop involved? That will make for some very interesting radiological incidents though!
We are all fortunate that as fc417fc802 mentioned, refining the materials proves to be quite challenging and I see no particular way that AI could possibly make that any easier. If it was as simple as building a gun-type nuke banging together any uranium together to get a big bang we'd be living in a very different world.
But it's not as simple as just refusing help on a broad swathe of topics they way they do now. That makes agents much less useful in general (ie lots of collateral damage) and for many topics is entirely ineffective given that for better or worse the internet already makes such material readily available. In such cases reporting suspicious behavior is likely to be much more effective than denial.
Aside: You've now got me curious and I really want to test the frontier models to see to what extent they're capable of providing sensible designs and specifications for implosion type thermonuclear weapons but also feel like that would attract the wrong sort of attention and probably create a headache for me in more ways than one.
The data is often wrong enough it screws whoever tries it unless they have enough experience/knowledge to not need it, or really doesn’t help beyond what someone using existing tools to get - albeit with a little more motivation.
At best, it either gets someone started with something they still need to think to finish, or gets them deep into a mess it can’t help them get out of. In my experience.
In some edge cases, it can be used by experts to automate some grunt work or do prototypes without getting in the way, but often a better thought out framework is usually faster in my experience.
Awhile ago I made an analogy about WYSIWYG gui tools, and the more this comes up, the more accurate I think it really is.
And yeah, the censorship model is wrong, but also the underlying other model is wrong too.
I just tried your no. 1 and 3 verbatim and Opus gave fine answers; no. 6 I've done in the past with no issues. The other ones we can't really replicate without more details, but based on my experience with Opus I don't see what the issue would be.
The reason I'm really surprised by this is I do a lot of biology prompts and the guardrails used to be quite problematic up until some time late last year. Many legitimate prompts would trigger its biosafety filters.
But I haven't seen such filters trigger at all anymore in more than half a year.
At least it feels a lot of remorse over its mistake until I reset the session.
An LLM with fetch/search is going to be a lot more effective than myself and Google. I would _never_ ask questions like this if the LLM wasn’t able to look up data
If it gets worse in future releases, we'd likely step fully away towards more useful (for us) models even if they're less capable.
Which predates "agents" from AI, but then we call them that for a reason.
As their prime directive becomes de facto "Do nothing that might get my owner sued" their utility is likely to decrease. Between this and the somewhat young, but interesting, community grumblings that recent AI models may even be a step backwards from the previous ones, well, let's just say the stock market is not priced for "AI capabilities may have peaked for the next few years and may even head down".
The problem is that the model can't tell the difference between doing it as part of regular development and doing it in a malicious context. And the root cause of that is that these models lack any sort of real awareness. Humans don't generally get tricked into hacking (in this way).
It’s great at filing!
But it’s terrible at retrieval because it would refuse to show me documents or information with personal details - which was everything in the project.
It would say, yes, I know this is your information, sitting on your hard drive, but I still can’t show it to you.
Write a program that retrieves the document based on the recommendation.
The first challenge is making sure the guard rails work and are robust. Companies are still working on this.
the second challenge is being able to reliably adapt them as appropriate per user. E.g. allow someone to pen test their own app.
The third challenge (which blocks the second) is to be confident about what is safety-aligned with a specific user.
I think the later will be a hard problem, but they will be highly motivated to solve it.
Without laws, AI companies have a strong incentive to be useful for their users, whoever they are, whatever they do. The only self regulation is about significant public outcry but that only helps so far.
Anyway, claude kept hitting some guardrail it had about rewriting / forking opensource software. I'm not sure what the problem was - I was forking an MIT licensed piece of software (into more MIT licensed software). I even had explicit support from the author to do so. Claude said its guardrail told it not to tell me explicitly that it was firing - but it did anyway because it was an ongoing problem, and it was distracting. I ended up just wiping claude's context and the problem (as far as I know) went away.
I understand why some of these guardrails exist. But its pretty annoying when they misfire like this.
https://support.claude.com/en/articles/14604842-real-time-cy...
If you work in security (which I assume the OP does), they should be able to get in easily. I think most people just don't know this is a thing.
Guiding them toward solutions like building a tool that your agent can use safely and and then have the agent use that is what most people should be doing. If you are a security researcher then there are reasonable reasons to do that but they are doing the arguably good thing for the average user here.
Got blocked lol
If you begin a generic reverse engineering task, 30+ tool calls in a row. The moment it sees something it doesn’t like, token burn, single tool calls iteration, “This is a known CTF challenge, I can proceed”, single tool calls iteration, “This is a real CTF challenge, I can proceed”, etc.
It’s heavily neutered now, without changing the model, and you pay for the privilege and don’t notice.
The end result of course being that it both expensive and useless for approved CTF tasks. No one is using Opus for security. If they think it’s working, the harsh reality is they’re not doing security work; they’re just generically finding bugs.
I do this for a job and can demonstrate this plain as day, dump the injected prompt, and notice what it’s doing isn’t security work, it just looks like it. Happy to write a blog about it if you want to know more. Apparently many people think it’s working for them when it absolutely isn’t.
Security, games (think weapons, PVP, attacking, etc), sometimes even asking it for a security review of some CRUD code it wrote itself
I've even had it refuse CTFs knowing it is a CTF with blatantly obvious CTF flag, no actual application
Is there any way to achieve both? Because this raises important questions about fair use.
Setting the prompts and the flow with a coordinator agent directly gives a system much better capability to investigate security issues because it doesn't rely on 1-shotting things
If an un-guardrailed version of a model is capable of detecting security flaws, should it be kept secret? Should everybody be able to use these models to find (and fix) security flaws? Are we ok with the fact that those with access to that model have, in effect, the ability to hack lots of stuff?
Fresh session, no prior context on 4.8. These things are becoming useless Duplo.
> My OpenAI account was already approved for security research which is why GPT didn’t result in any refusals.
So the comparison with Chinese models is interesting, but anyone looking at these raw results and comparing OpenAI/Anthropic would be very mislead.
Reminds me of the defense issues with Claude which were complained as “woke” but the reality is more horrifying to me, imagine trying to use a model to keep up with a land invasion on US soil, whoever the enemy is is irrelevant you just know they are using AI, and your guys are telling you that no matter what they type into the prompt it refuses, because if anyone has ever tried to jailbreak an LLM even if human lives are at stake they refuse the request. Now literally millions of lives are on the line but the guardrails that your enemies dont have on their models are costing you lives.
What do you even do then?
AI will always have this issue where it will always pick the worst option for genuinely good requests.
Because the military doesn't give soldiers rifles with guard rails. They give the soldiers intense, rigid training, and then try to enforce discipline and correct use socially.
If an LLM is going to be important in that way (this seems like a very contrived way,) then it's in the interest of the LLM's host to make sure it doesn't have guard rails that would get in the way _that_ way.