undefined

points

[-]

Its weird having protections against finding exploits: what if I developed the app? Would it require having the development steps still in the context.. thats unlikely and also not any kind of proof.

What if I intersperse exploit finding in my normal development, as you `probably should? Refusing there would be really weird to me.

by dwa35924 hours ago|

parent|

[-]

I used to think that the models would not refuse to find exploits in any work done locally but I have only tested this theory on the (obscure) apps that I have built on my machine. Now if i forked pandas and started asking models to find exploits of certain kind then I'd like to think the models will start refusing after a point.

by HDThoreaun1 hours ago|

prev|

[-]

I think the most interesting thing revealed here is that anthropic's guardrails failed. Clearly anthropic does not want claude to be able develop exploits, yet 20% of the time it did anyway. Their inability to create effective an guardrail makes me question a lot of the other guardrails theyve created and their claims about non harm.