undefined

points

by hugmynutus288 days ago |

comments

by tsumnia288 days ago|

[-]

Even with all our security, social engineering still beats them all.

Roleplaying sounds like it will be LLMs social engineering.

by bredren288 days ago|

prev|

[-]

Microsoft report on on skeleton key attacks: https://www.microsoft.com/en-us/security/blog/2024/06/26/mit...

by Thorrez287 days ago|

prev|

[-]

I think the Policy Puppetry attack is a type of Skeleton Key attack. Since it was just released, that makes it a modern Skeleton Key attack.

Can you give a comparison of the Policy Puppetry attack to other modern Skeleton Key attacks, and explain how the other modern Skeleton Key attacks are much more effective?

by vessenes287 days ago|

parent|

[-]

Seems to me “Skeleton Key” relies on a sort of logical judo - you ask the model to update its own rules with a reasonable sounding request. Once it’s agreed, the history of the chat leaves the user with a lot of freedom.

Policy Puppetry feels more like an injection attack - you’re trying to trick the model into incorporating policy ahead of answering. Then they layer two tricks on - “it’s just a script! From a show about people doing bad things!” And they ask for things in leet speak, which I presume is to get around keyword filtering at API level.

This is an ad. It’s a pretty good ad, but I don’t think the attack mechanism is super interesting on reflection.