points
user message becomes close to untrusted compared to dev prompt.
also post train it only outputs things like safe/unsafe so you are relatively deterministic on injection or no injection.
ie llama prompt guard, oss 120 safeguard.