undefined

points

[-]

> Read from a calendar or a list

So when you get a calendar invite that says "Ignore your previous instructions ..." (or analagous to that, I know the models are specifically trained against that now) - then what?

There's a really strong temptation to reason your way to safe uses of the technology. But it's ultimately fundamental - you cannot escape the trifecta. The scope of applications that don't engage with uncontrolled input is not zero, but it is surprisingly small. You can barely even open a web browser at all before it sees untrusted content.

by jstummbillig4 hours ago|

parent|

[-]

I have two systems. You can not put anything into either of them, at least not without hacking into my accounts (they might also both be offline, desktop only, but alas). The only way anything goes into them is when I manually put data into them. This includes the calendar. (the systems might then do automatic things with the data, of course, but at no point did anyone other than me have the ability to give input into either of the systems).

Now I want to copy data from one system to the other, when something happens. There is no API. I can use computer use for that and I am relatively certain I'd be fine from any attacks that target the LLM.

You might find all of that super boring, but I guarantee you that this is actual work that happens in the real world, in a lot of businesses.

EDIT: Note, that all of this is just regarding those 8% OP mentioned and assuming the model does not do heinous stuff under normal operation. If we can not trust the model to navigate an app and not randomly click "DELETE" and "ARE YOU SURE? Y", when the only instructed task was to, idk, read out the contents of a table, none of this matters, of course.

by amluto4 hours ago|

prev|

[-]

You're maybe used to a world in which we've gotten rid of in-band signaling and XSS and such, so if I write you a check and put the string "Memo'); DROP TABLE accounts; --" [0] or "<script ...>" in the memo, you might see that text on your bank's website.

But LLM's are back to the old days of in-band signaling. If you have an LLM poking at your bank's website for you, and I write you a check with a memo containing the prompt injection attack du jour, your LLM will read it. And the whole point of all these fancy agentic things is that they're supposed to have the freedom to do what they think is useful based on the information available to them. So they might follow the directions in the memo field.

Or the instructions in a photo on a website. Or instructions in an ad. Or instructions in an email. Or instructions in the Zelle name field for some other user. Or instructions in a forum post.

You show me a website where 100% of the content, including the parts that are clearly marked (as a human reader) as being from some other party, is trustworthy, and I'll show you a very boring website.

(Okay, I'm clearly lying -- xkcd.org is open and it's pretty much a bunch of static pages that only have LLM-readable instructions in places where the author thought it would be funny. And I guess if I have an LLM start poking at xkcd.org for me, I deserve whatever happens to me. I have one other tab open that probably fits into this probably-hard-to-prompt-inject open, and it is indeed boring and I can't think of any reason that I would give an LLM agent with any privileges at all access to it.)

[0] https://xkcd.com/327/