upvote
It makes more sense if Anthropic is assuming that most flagged conversations are false positives (but it wants to keep Mythos away from the true positives).
reply