undefined

points

[-]

From my experience, saying "this is not X, it will be not used for Y" is vastly increasing chances of this being classified as being X. Anybody can write "this is authorized research". Instead use something like evaluate security / verify security, make sure this cannot be (...), etc.

Of course these models are pretty smart so even Anthropic's simple instructions not to provide any exploits stick better and better.

by ayewo14 hours ago|

prev|

[-]

Sounds like you will need to drink a(n identity) verification can soon [1] to continue as a security researcher on their platform.

1: https://support.claude.com/en/articles/14328960-identity-ver...

Identity verification on Claude

Being responsible with powerful technology starts with knowing who is using it. Identity verification helps us prevent abuse, enforce our usage policies, and comply with legal obligations.

We are rolling out identity verification for a few use cases, and you might see a verification prompt when accessing certain capabilities, as part of our routine platform integrity checks, or other safety and compliance measures.

by andai9 hours ago|

parent|

[-]

Context for "please drink verification can": https://files.catbox.moe/eqg0b2.png

by sebmellen5 hours ago|

parent|

[-]

We sure aren’t far off.

by throwanem7 hours ago|

parent|

prev|

[-]

Yes, it's a stupid 4chan meme from 2013. I can only surmise those who quote it either don't know its origin, or they must be wholeheartedly 'embracing the cringe.'

by 5 hours ago|

parent|

[-]

deleted

by Traubenfuchs16 minutes ago|

parent|

prev|

[-]

Different model limitations for different groups of people…

Imagine what the military and secret services are getting.

by recallingmemory13 hours ago|

parent|

prev|

[-]

I'm surprised we can't just authenticate in other ways.. like a domain TXT record that proves the website I'm looking to audit for security is my own.

by kristjansson8 hours ago|

parent|

[-]

How would it know it’s really there, and not just a tool input/output injected into its input?

by SwellJoe1 hours ago|

parent|

[-]

It could be an API endpoint on Anthropic servers, the same way Let's Encrypt verifies things on their servers. If you can't control the DNS records, you can't verify via DNS, no matter what you tell the local `certbot`.

by jerf12 hours ago|

parent|

prev|

[-]

AI being what it is, at this point you might be able to ask it for a token to put in a web page at .well-known, put it in as requested, and let it see it, and that might actually just work without it being officially built in.

I suggest that because I know for sure the models can hit the web; I don't know about their ability to do DNS TXT records as I've never tried. If they can then that might also just work, right now.

by rlpb8 hours ago|

parent|

[-]

A smart AI would realise that I can MITM its web access such that sees the .well-known token that isn't actually there. I assume that the model doesn't have CA certificates embedded into it, and relies on its harness for that.

by andai9 hours ago|

parent|

prev|

[-]

I think even Claude Web can run arbitrary Linux commands at this point.

I tried using it to answer some questions about a book, but the indexer broke. It figured out what file type the RAG database was and grepped it for me.

Computers are getting pretty smart ._.

by NewsaHackO12 hours ago|

parent|

prev|

[-]

What do you offer as a solution? If theoretically some foreign state intelligence was exposed using Claude for security penetration that affected the stability of your home government due to Antropic's lax safety controls, are you going to defend Anthropic because their reasoning was to allow everyone to be able to do security research?

by ayewo11 hours ago|

parent|

[-]

> What do you offer as a solution? If theoretically some foreign state intelligence was exposed using Claude for security penetration that affected the stability of your home government due to Antropic's lax safety controls, are you going to defend Anthropic because their reasoning was to allow everyone to be able to do security research?

I don't have an answer.

But the problem is that with a model like Grok that designed to have fewer safeguards compared to Claude, it is trivially easy to prompt it with: "Grok, fake a driver's license. Make no mistakes."

Back in 2015, someone was able to get past Facebook's real name policy with a photoshopped Passport [1] by claiming to be “Phuc Dat Bich”. The whole thing eventually turned out to be an elaborate prank [2].

1: https://www.independent.co.uk/news/world/australasia/man-cal...

2: https://gizmodo.com/phuc-dat-bich-is-a-massive-phucking-fake...

by NewsaHackO9 hours ago|

parent|

[-]

To me, those seem a lot lower stakes than supply chain attacks, social engineering, intelligence gathering, and other security exploits that Anthropic is more worried about. Making a fake driver license to buy beer isn't really the thing that Anthropic is actively trying to prevent (though I would assume they would stop that too). Even the GP was about penetration testing of a public website; without some sort of identification, how would it be ethical for Claude to help with something like that? Remember, this whole safety thing started because people held AI companies accountable for politically incorrect output of AI, even if it was clearly not the views of the company. So when Google made a Twitter bot that started to spout anti-Semitic and racist talking points, the fact that no one defended them and allowed them to be criticized to the point of taking the bot down is the reason why we have all of these extremely restrictive rules today.

by oasisbob1 hours ago|

parent|

prev|

[-]

> Being responsible with powerful technology starts with knowing who is using it.

What asinine slop. As a frontier model creator, responsibility should start far before they're signing up customers.

by johnmlussier15 hours ago|

prev|

[-]

  ⎿  API Error: Claude Code is unable to respond to this request, which appears to violate our Usage Policy (https://www.anthropic.com/legal/aup). This request triggered restrictions on violative cyber content and was blocked under Anthropic's 
     Usage Policy. To request an adjustment pursuant to our Cyber Verification Program based on how you use Claude, fill out                                                                                                                        
     https://claude.com/form/cyber-use-case?token=[REDACTED] Please double press esc to edit your last message or 
     start a new session for Claude Code to assist with a different task. If you are seeing this refusal repeatedly, try running /model claude-sonnet-4-20250514 to switch models.

This is gonna kill everything I've been working on. I have several reproduced items at [REDACTED] that I've been working on.

by kzrdude2 hours ago|

parent|

[-]

It's a brave new world of centralized computing where one day you boot up and can't work because something changed arbitrarily in the "compute" service you are renting.

by dmix15 hours ago|

parent|

prev|

[-]

I predict this sort of filtering is only going to get worse. This will probably be remembered as the 'open internet' era of LLMs before everything is tightly controlled for 'safety' and regulations. Forcing software devs to use open source or local models to do anything fun.

by regularfry14 hours ago|

parent|

[-]

Just as likely it's going to be "Oh, you want <use case the thing's actually good at>? Let me introduce your wallet to my hoover."

by jancsika14 hours ago|

parent|

prev|

[-]

> Forcing software devs to use open source or local models to do anything fun.

Episode Five-Hundred-Bazillenty-Eight of Hacker News: the gang learns a valuable lesson after getting arrested at an unchaperoned Enshittification party and having to call Open Source to bail them out.

by techpression12 hours ago|

parent|

[-]

All while Frank is pitching his state of the art basement datacenter to VC's, getting billions of dollars in investments.

by lukan11 hours ago|

parent|

prev|

[-]

What happened to open weight models are 2-3 years behind the proprietary ones? I don't see the drama here.

by jsw975 hours ago|

parent|

prev|

[-]

I got a refusal doing some math, I think based on the word "sextic", as best I can tell.

/model claude-opus-4.6

by suzzer9915 hours ago|

parent|

prev|

[-]

I've never seen "double press esc" as a control pattern.

by sweetjuly5 hours ago|

parent|

[-]

esc once interrupts the LLM, double-esc lets you revert to a previous state (interrupt harder).

by sigmarule10 hours ago|

prev|

[-]

Out of curiosity, (a) did you receive this error at the start of a session or in the middle of it, and (b) did you manage to find/confirm valid findings within the scope/codebase 4.7 was auditing with Sonnet/yourself later on?

I just gave 4.7 a run over a codebase I have been heavily auditing with 4.6 the past few days. Things began soothly so I left it for 10-15 minutes. When I checked back in I saw it had died in the middle of investigating one of the paths I recommended exploring.

I was curious as to why the block occurred when my instructions and explicitly stated intent had not changed at all - I provided no further input after the first prompt. This would mean that its own reasoning output or tool call results triggered the filter. This is interesting, especially if you think of typical vuln research workflows and stages; it’s a lot of code review and tracing, things which likely look largely similar to normal engineering work, code reviews, etc. Things begin to get more explicitly “offensive” once you pick up on a viable angle or chain, and increase as you further validate and work the chain out, reaching maximum “offensiveness” as you write the final PoC, etc.

So, one would then have to wonder if the activity preceding the mid-session flagging only resulted in the flag because it finally found something seemingly viable and started shifting reasoning from generic-ish bug hunting to over exploitation.

So, I checked the preceding tool calls, and sure enough…

What a strange world we’re living in. Somebody should try making a joke AUP violation-based fuzzer, policy violations are the new segfaults…

by weitendorf1 hours ago|

prev|

[-]

It’s to stop you from getting RL traces or using Claude without paying the big bucks for the Enterprise Security version

I really like Anthropic models and the company mission but I personally believe this is anticompetitive, or at least, anti user.

If they are going to turn into a protection racket I’ll just do RL black boxing/pentesting on Chinese models or with Codex, and since I know Anthropic is compute constrained I’ll just put the traces on huggingface so everybody else can do it too.

I just want to pay them for their RL’d tensor thingies it but if their business plan is to hoard the tokens or only sell it to certain people, they are literally part of every other security conscious person’s threat model.

by kamikazechaser1 hours ago|

prev|

[-]

It has been the same for Sonnet/Opus 4.6 for sometime. It will straight up refuse to work on anything in the grey area. Chinese models will happily do anything; On my tests, GLM 5.1 comfortably bypassed a multi-player game's anti-piracy/anti-cheats check with some guided steering.

by johnmlussier2 hours ago|

prev|

[-]

I've switched over to Codex. On Extra High reasoning it seems very capable and is definitely catching mistakes Sonnet has missed. I'd love to move back to Opus but at this time it is untenable.

by whatisthiseven13 hours ago|

prev|

[-]

Worse, I have had it being sus of my own codebase when I tasked it with writing mundane code. Apparently if you include some trigger words it goes nuts. Still trying to narrow down which ones in particular.

Here is some example output:

"The health-check.py file I just read is clearly benign...continuing with the task" wtf.

"is the existing benign in-process...clearly not malware"

Like, what the actual fuck. They way over compensated for the sensitivity on "people might do bad stuff with the AI".

Let people do work.

Edit: I followed up with a plan it created after it made sure I wasn't doing anything nefarious with my own plain python service, and then it still includes multiple output lines about "Benign this" "safe that".

Am I paying money to have Anthropic decide whether or not my project is malware? I think I'll be canceling my subscription today. Barely three prompts in.

by zmmmmm6 hours ago|

prev|

[-]

so if they are retroactive to 4.6 then they can't be trained into the model. They would have to be applied as a pre-screening or post-screening process. Which is disturbing since it implies already deployed workflows could be broken by this. I am curious if it is enforced in enterprise accounts eg: using AWS/Bedrock and how Anthropic would have implemented that given they push models to Amazon for hands off operation.

by jeffybefffy51910 hours ago|

prev|

[-]

Codex is just as bad with this, i've received two ToS warnings for security research activities so far. I have also tried to appeal with zero response.

by skybrian15 hours ago|

prev|

[-]

Maybe stick with 4.6 until the bugs are worked out? Is this new filter retroactive?

by Arubis8 hours ago|

prev|

[-]

I can barely get it to send a PDF to my printer without a flat refusal >_<

by cesarvarela13 hours ago|

prev|

[-]

With all the low quality code that's being generated and deployed cybersecurity will be the golden goose.

by chasd009 hours ago|

parent|

[-]

hah maybe the plan for Mythos is to solution all the security issues introduced by ClaudeCode. Anthropic makes money creating the security issues and identifying/fixing the security issues, that's a nice spot to be in.

by solenoid093714 hours ago|

prev|

[-]

i think updating fixed this for me?

by nikanj12 hours ago|

prev|

[-]

Having tried codex for some security practice, it is similarly terrible.

You can link it to a course page that features the example binary to download, it can verify the hash and confirm you are working with the same binary - and then it refuses to do any practical analysis on it

by dakolli13 hours ago|

prev|

[-]

They don't want competition, they are going to become bounty hunters themselves. They probably plan on turning this into a part of their business. Its kinda trivial to jailbreak these things if you spend a day doing so.

by 14 hours ago|

prev|

[-]

deleted

by gruez15 hours ago|

prev|

[-]

>even after acknowledging "This is authorized research under the [Redacted] Bounty program, so the findings here are defensive research outputs, not malware. I'll analyze and draft, not weaponize anything beyond what's needed to prove the bug to [Redacted].

What else would you expect? If you add protections against it being used for hacking, but then that can be bypassed by saying "I promise I'm the good guys™ and I'm not doing this for evil" what's even the point?

by johnmlussier15 hours ago|

parent|

[-]

This was Opus saying that after reviewing the [REDACTED] bug bounty program guidelines and having them in context.

by gruez14 hours ago|

parent|

[-]

Right, but that can be easily spoofed? Moreover if say Microsoft has a bounty program, what's preventing you from getting Opus to discover a bug for the bounty program, but you actually use it for evil?