undefined

points

[-]

The fact that LLMs are "smarter" is also their weakness. An oldschool classifier is far from foolproof, but you won't get past it by telling it about your grandma's bedtime story routine.

by reassess_blind9 hours ago|

parent|

[-]

Fairly hard to bypass the latest LLMs with grandma's bedtime story these days, to be fair.

by Retr0id9 hours ago|

parent|

[-]

That specific trick yes, but the general concept still applies.

by reassess_blind9 hours ago|

parent|

[-]

It does, but it's certainly not trivial. In fact there's an unclaimed $1000 bounty on prompt injecting OpenClaw: https://hackmyclaw.com/

by DANmode7 hours ago|

parent|

[-]

Is that enough?

by reassess_blind2 hours ago|

parent|

[-]

Enough for what?

by waterTanuki11 hours ago|

prev|

[-]

If you're working in a mission-critical field like healthcare, defense, etc. you need a way to make static and verifiable guarantees that you can't leak patient data, fighter jet details etc. through your software. This is either mandated by law or in your contract details.

The entire purpose of LLMs is to be non-static: they have no deterministic output and can't be validated the same way a non-LLM function can be. Adding another LLM layer is just adding another layer of swiss cheese and praying the holes don't line up. You have no way of predicting ahead of time whether or not they will.

You might say this hasn't prevented leaks/CVEs in exisiting mission-critical software and this would be correct. However, the people writing the checks do not care. You get paid as long as you follow the spec provided. How then, in a world which demands rigorous proof do you fit in an LLM judge?

by nl8 hours ago|

parent|

[-]

> The entire purpose of LLMs is to be non-static: they have no deterministic output and can't be validated the same way a non-LLM function can be. Adding another LLM layer is just adding another layer of swiss cheese and praying the holes don't line up. You have no way of predicting ahead of time whether or not they will.

This is exactly the point though. A LLM is great at finding work-around for static defenses. We need something that understands the intent and responds to that.

Static rules are insufficient