upvote
DARVO stands for "Deny, Attack, Reverse Victim and Offender," and it is a manipulation tactic often used by perpetrators of wrongdoing, such as abusers, to avoid accountability. This strategy involves denying the abuse, attacking the accuser, and claiming to be the victim in the situation.
reply
Thanks for the context
reply
Isn't this also the tactic used by someone who has been falsely accused? If one is innocent, should they not deny it or accuse anyone claiming it was them of being incorrect? Are they not a victim?

I don't know, it feels a bit like a more advanced version of the kafka trap of "if you have nothing to hide, you have nothing to fear" to paint normal reactions as a sign of guilt.

reply
Exactly. And I have hundreds of examples of just that. Hence my fascination, awe and terror.....
reply
I bullet pointed out some ideas on cobbling together existing tooling for identification of misleading results. Like artificially elevating a particular node of data that you want the llm to use. I have a theory that in some of these cases the data presented is intentionally incorrect. Another theory in relation to that is tonality abruptly changes in the response. All theory and no work. It would also be interesting to compare multiple responses and filter through another agent.
reply
Sum guy vs. product guy is amusing. :)

Regarding DARVO, given that the models were trained on heaps of online discourse, maybe it’s not so surprising.

reply
Meta awareness, repeatability, and much more strongly indicates this is deliberate training... in my perspective. It's not emergent. If it was, I'd be buggering off right now. Big big difference.
reply