undefined

points

[-]

I'm surprised there are no security researchers that would pick up on this.

Take the same prompt and all incoming mails and run again through various existing models, even the simpler local ones. He now has a serious cross section of prompt injection ideas. This is a publication I would like to read!

For privacy reasons I understand the corpus might not get published. But for a research collaboration and safeguards (don't send automatic answers from each model you try)... why not?

by cuchoi10 hours ago|

prev|

[-]

It's possible. I implemented something similar when I figured out that batch processing contaminated the excercise.

by croes15 hours ago|

prev|

[-]

Or check if the results are the same even with the same model