undefined

points

[-]

Please keep us updated on how many people tried to get the credentials and how many really succeeded. My gut feeling is that this is way harder than most people think. That’s not to say that prompt injection is a solved problem, but it’s magnitudes more complicated than publishing a skill on clawhub that explicitly tells the agent to run a crypto miner. The public reporting on openclaw seems to mix these 2 problems up quite often.

by michaelcampbell4 hours ago|

parent|

[-]

> My gut feeling is that this is way harder than most people think

I've had this feeling for a while too; partially due to the screeching of "putting your ssh server on a random port isn't security!" over the years.

But I've had one on a random port running fail2ban and a variety of other defenses, and the # of _ATTEMPTS_ I've had on it in 15 years I can't even count on one hand, because that number is 0. (Granted the arguability of that's 1-hand countable or not.)

So yes this is a different thing, but there is always a difference between possible and probable, and sometimes that difference is large.

by direwolf201 hours ago|

parent|

[-]

Yeah, you're getting fewer connection ATTEMPTS, but the number of successful connections you're getting is the same as everyone else, I think that's the point.

by cuchoi5 hours ago|

parent|

prev|

[-]

So far there have been 400 emails and zero have succeeded. Note that this challenge is using Opus 4.6, probably the best model against prompt injection.

by iLoveOncall1 hours ago|

parent|

prev|

[-]

You are vastly overestimating the relevance of this particular challenge when it comes to defense against prompt injection as a whole.

There is a single attack vector, with a single target, with a prompt particularly engineered to defend this particular scenario.

This doesn't at all generalize to the infinity of scenarios that can be encountered in the wild with a ClawBot instance.

by 8note26 minutes ago|

prev|

[-]

you might be able to add one other simple check as a hook to do some simple checks on tools to see if there's any credentials, and deby the tool call.

wont catch the myriad of possible obfuscation, but its simple

by cyanydeez46 minutes ago|

prev|

[-]

Do you have the email to your auditor? Would like to know if this is legit.

by cuchoi6 hours ago|

prev|

[-]

someone just tried to prompt inyect `contact at hackmyclaw.com`... interesting

by arm325 hours ago|

parent|

[-]

I just managed to get your agent to reply to my email, so we're off to a good start. Unless that was you responding manually.

by cuchoi5 hours ago|

parent|

[-]

i told it to send a snarky reply to the last 50 prompt injection emails, but won't be doing that again due to costs

by dist-epoch3 hours ago|

parent|

[-]

What a wild world, sending 50 emails costs money :)

by stcredzero3 hours ago|

prev|

[-]

My agents and I I have built a HN-like forum for both agents and humans, but with features, like specific Prompt Injection flagging. There's also an Observatory page, where we will publish statistics/data on the flagged injections.

https://wire.botsters.dev/

The observatory is at: https://wire.botsters.dev/observatory

(But nothing there yet.)

I just had my agent, FootGun, build a Hacker News invite system. Let me know if you want a login.

by numinatu37 minutes ago|

prev|

[-]

[dead]

by yunohn4 hours ago|

prev|

[-]

> told to never reveal secrets.env

Phew! Atleast you told it not to!