undefined

points

by beeflet5 hours ago |

comments

by ViscountPenguin4 minutes ago|

[-]

The good regulator theorem makes that a little difficult.

by OptionOfT13 minutes ago|

prev|

[-]

I'm working on new technology where you separate the instructions and the variables, to avoid them being mixed up.

I call it `prepared prompts`.

by WhitneyLand2 hours ago|

prev|

[-]

It’s not that simple.

That would result in a brittle solution and/or cat and mouse game.

The text that goes into a prompt is vast when you consider common web and document searches are.

It’s going to be a long road to good security requiring multiple levels of defense and ongoing solutions.

by moregrist2 hours ago|

parent|

[-]

If only we had a reliable way to detect that a poster was being sarcasm or facetious on the Internet.

by ponector1 hours ago|

parent|

[-]

The solution is to sanitize text that goes into the prompt by creating a neural network that can detect sarcasm.

by int_19h3 minutes ago|

parent|

[-]

Unfortunately it takes ~9 months just to build that network up to the point where you can start training it, and then the training itself is literally years of hard effort.

by kristianc28 minutes ago|

parent|

prev|

[-]

Ah, the Seinfeld Test.

by ares6231 hours ago|

parent|

prev|

[-]

A sarcasm machine is finally within our reach

by dgfitz2 hours ago|

parent|

prev|

[-]

I assumed beeflet was being sarcastic.

There’s no way it was a serious suggestion. Holy shit, am I wrong?

by beeflet2 hours ago|

parent|

[-]

I was being half-sarcastic. I think it is something that people will try to implement, so it's worth discussing the flaws.

by zhengyi133 hours ago|

prev|

[-]

Turtles all the way down; got it.

by horizion20254 hours ago|

prev|

[-]

Isn't that just another guardrail that can be bypassed much the same as the guard rails are currently quite easily bypassed? It is not easy to detect a prompt. Note some of the recent prompt injection attack where the injection was a base64 encoded string hidden deep within an otherwise accurate logfile. The LLM, while seeing the Jira ticket with attached trace , as part of the analysis decided to decode the b64 and was led a stray by the resulting prompt. Of course a hypothetical LLM could try and detect such prompts but it seems they would have to be as intelligent as the target LLM anyway and thereby subject to prompt injections too.

by wrs3 hours ago|

parent|

[-]

Yep.

https://gandalf.lakera.ai/baseline

by Huppie2 hours ago|

parent|

[-]

This is genius, thank you.

by darepublic2 hours ago|

parent|

prev|