upvote
So many models refuse to do that due to alignment and safety concerns. So cross-model comparison doesn't make sense. We do, however, require proof (such as providing a location in binary) that is hard to game. So the model not only has to say there is a backdoor, but also point out the location.

Your approach, however, makes a lot of sense if you are ready to have your own custom or fine-tuned model.

reply
Surprising that they still allow to catch the back doors but not use them.

A bad actor already has most of the work done.

reply
Sounds like the pitch writes itself, "you'd better spend a lot of token money with us before the bad guys do it to you..."
reply