upvote
How do you know simple detection was the most Anthropic did and nothing more actively nefarious? The self-reposrted motivation was animus against "distillation attacks", which suggests active server-side countermeasures that could range from the overt (IP or user account bans) to covert (downgrading model performance or poisoning the response)
reply