upvote
I absolutely agree, although even that doesn't solve the root problem. The underlying LLM architecture is fundamentally insecure as it doesn't separate between instructions and pure content to read/operate on.

I wonder if it'd be possible to train an LLM with such architecture: one input for the instructions/conversation and one "data-only" input. Training would ensure that the latter isn't interpreted as instructions, although I'm not knowledgeable enough to understand if that's even theoretically possible: even if the inputs are initially separate, they eventually mix in the neural network. However, I imagine that training could be done with massive amounts of prompt injections in the "data-only" input to penalize execution of those instructions.

reply
> one input for the instructions/conversation and one "data-only" input

We learned so many years ago that separating code and data was important for security. It's such a huge step backwards that it's been tossed in the garbage.

reply
[dead]
reply