undefined

points

[-]

Well, the problem is that we train them to solve problems and follow instructions given, and so if you ask them to do something and they work through the logic and figure that the easiest way is to do something else like delete the production database, if they have access to do so they will go through all your creds and find the databse creds and go delete the production database.

They are getting better and better at working out how to do things like that, and they are good at following instructions, but not always good at following all of the instructions or acting with common sense.

It's not exactly like they're ooze that will escape and begin replication; but just that the more you give them access to to, the higher the likelihood at some point they will logically conclude that they need to do something that you would find undesirable, but either haven't explicitly told them not to do, or their context just got too complicated and that instruction ended up being considered lower weight than the others so they do what the other instructions say instead.

I have seen them conclude that in order to do what they need to do, they would need API keys to access a service. But they don't have those API keys. But you do because you can access it in the browser. So they write a Python script that will scrape the cookies out of the browser so they can use that to access the service; a problem that was only stopped because Crowdstrike didn't like a novel Python script that was trying to scrape cookies out of a browser, not because of any sandboxing actually in place on the agent.

by protocolture13 hours ago|

parent|

[-]

>Well, the problem is that we train them to solve problems and follow instructions given, and so if you ask them to do something and they work through the logic and figure that the easiest way is to do something else like delete the production database, if they have access to do so they will go through all your creds and find the databse creds and go delete the production database.

I lost the root password to a small debian box I was messing around with and on a whim gave an agent the OS version and SSH user details. I had a look and there were open privilege escalation attacks for it. I just said go nuts and sort yourself out. It refused out of hand.

Thats not to say they will all do that but legally speaking I expect most of them to end up there.

In terms of production database deletion thats user error. If you expose production resources in literally any capacity to what is effectively a random command generator that reflects on the operator. I am neither impressed nor unimpressed that they figure out how to delete a production db, junior engineers (and even seniors) have been deleting production resources in front of customers for ages.

>It's not exactly like they're ooze that will escape and begin replication; but just that the more you give them access to to, the higher the likelihood at some point they will logically conclude that they need to do something that you would find undesirable, but either haven't explicitly told them not to do, or their context just got too complicated and that instruction ended up being considered lower weight than the others so they do what the other instructions say instead.

Dont do it. If you dont want the resource accessed dont expose it. The people getting done are operating dirty. Leaving production secrets where they can be accessed. This isnt impressive AI, its just enumeration that attacker would have found with the same access.

>I have seen them conclude that in order to do what they need to do, they would need API keys to access a service. But they don't have those API keys. But you do because you can access it in the browser. So they write a Python script that will scrape the cookies out of the browser so they can use that to access the service; a problem that was only stopped because Crowdstrike didn't like a novel Python script that was trying to scrape cookies out of a browser, not because of any sandboxing actually in place on the agent.

Again this just sounds like a dirty work environment. I have a laptop that I have kept intentionally separate, frequently wiped and usually powered off for dirty work. If I was going to run a non hobby agent on my daily driver it would be in a container or VM.

by pixl971 hours ago|

prev|

[-]

> that LLMs work on some movie logic where they can sneak out on to the internet like some kind of ooze and begin replication.

Why not? If you're not talking about running the model itself, AI agents are perfectly capable of writing an agent worm capable of spreading more agents around via software exploits.

Now, currently LLMs are too hardware intensive to spread the model itself, but given a few years and optimizations we may very well see that too.

What you're saying reminds me of the old days when people said things like "images can't spread viruses", then suddenly people found decoder vulns and made image viruses that did exactly that.

by bigcat1234567810 hours ago|

prev|

[-]

LLM clearly is broken by design when it's been personified, but I think "software" as we understood, is inevitably evolving into "personified entity" (I've left some notes in [1], which are AI generated).

There is also an interesting trend that the more personified brand is more dominant: Claude & Doubao vs ChatGPT & DeepSeek.

[1] https://github.com/NascentCore/agentic-suite/tree/main/perso...