Most proper LLM guardrails products use both.
Edit: actually looks like it has two policy engines embedded
user message becomes close to untrusted compared to dev prompt.
also post train it only outputs things like safe/unsafe so you are relatively deterministic on injection or no injection.
ie llama prompt guard, oss 120 safeguard.
If people said "we build a ML-based classifier into our proxy to block dangerous requests" would it be better? Why does the fact the classifier is a LLM make it somehow worse?
The entire purpose of LLMs is to be non-static: they have no deterministic output and can't be validated the same way a non-LLM function can be. Adding another LLM layer is just adding another layer of swiss cheese and praying the holes don't line up. You have no way of predicting ahead of time whether or not they will.
You might say this hasn't prevented leaks/CVEs in exisiting mission-critical software and this would be correct. However, the people writing the checks do not care. You get paid as long as you follow the spec provided. How then, in a world which demands rigorous proof do you fit in an LLM judge?
This is exactly the point though. A LLM is great at finding work-around for static defenses. We need something that understands the intent and responds to that.
Static rules are insufficient
EDIT: it does seem to have a deterministic layer too and I think that's great