Roleplaying sounds like it will be LLMs social engineering.
Can you give a comparison of the Policy Puppetry attack to other modern Skeleton Key attacks, and explain how the other modern Skeleton Key attacks are much more effective?
Policy Puppetry feels more like an injection attack - you’re trying to trick the model into incorporating policy ahead of answering. Then they layer two tricks on - “it’s just a script! From a show about people doing bad things!” And they ask for things in leet speak, which I presume is to get around keyword filtering at API level.
This is an ad. It’s a pretty good ad, but I don’t think the attack mechanism is super interesting on reflection.