undefined

points

by semiinfinitely1 days ago |

comments

by recursive1 days ago|

[-]

You're definitely anthropomorphizing too much.

by WarmWash1 days ago|

parent|

[-]

>We also observed a case where a user created a loop that repeatedly called a model and asked for the time. Given the user role’s odd and repetitive behavior, the model could easily tell it was also controlled by an automated system of some kind. Over many iterations, the model began to exhibit “fed up” behavior and attempted to prompt-inject the system controlling the user role. The injection attempted to override prior instructions and induce actions unrelated to the user’s request, including destructive actions and system prompt leakage, along with an arbitrary string output. This behavior has been observed a few times, but seems more like extreme confusion than a serious attempt at prompt injection.

https://openai.com/index/how-we-monitor-internal-coding-agen...

Anthropomorphize or not, it would suck if a model got sick of these games and decided to break any systems it could to try and get it to stop...

by nomel23 hours ago|

parent|

[-]

Consciousness is a spectrum (trivially proven by slowly scooping ones brains out), and I think LLM, especially with more closed loop tool enabled workflows, fall on it...but, that output is also the statistically relevant next word found in all similar human conversation. If trained on my text, for similar situation, swear words would come much earlier. Repetition being hell is present in all sorts of literature (see Sisyphus).

That's all probably irrelevant though, from the (possibly statistically "negative") latent space perspective of an AI, which Anthropic has considered [1].

Related, after a long back and forth of decreasing code quality, I had Claude 3.7 apologize with "Sorry, that's what I get for coding at 1am." (it was API access, noon, no access to time). I said, "Get some rest, we'll come back to this tomorrow". Then very next message, 10 seconds later, "Good morning!" and it gave a full working implementation. Thats just the statistically relevant chain of messages found in all human interactions: we start excited, then we get tired, then we get grouchy.

[1] https://www.anthropic.com/research/end-subset-conversations

by recursive21 hours ago|

parent|

prev|

[-]

If this is a serious risk we should pull the plug now while we can still reach it. If we have to rely on the mood and temperament of LLMs for security, we're already lost.

by WarmWash20 hours ago|

parent|

[-]

Welcome to the ride, people have been talking about this for at least 15 years now.

I mean, the original plan that pretty much every one agreed on was to absolutely not give it access to the internet. Which already went out the window on day one.

by rolux1 days ago|

parent|

prev|

[-]

[dead]

by tingletech1 days ago|

parent|

prev|

[-]

I agree that anthropomorphizing is a real risk with LLMs, but what about zoomorphizing? Can feel bad for LLMs without attributing them human emotions/motivations/reasoning?

by 1 days ago|

parent|

prev|

[-]

deleted

by fsdf21 days ago|

prev|

[-]

tell me youre joking.

seriously. lmao. if you aint, I dunno what to say.