undefined

points

[-]

This is fascinating and well worth reading the source document. Which, FYI, is the Opus 4 system card: https://www-cdn.anthropic.com/4263b940cabb546aa0e3283f35b686...

by nightpool3 days ago|

parent|

[-]

I also definitely recommend reading https://nostalgebraist.tumblr.com/post/785766737747574784/th... which is where I learned about this and has a lot more in-depth treatment about AI model "personality" and how it's influenced by training, context, post-training, etc.

by slfreference2 days ago|

parent|

[-]

You are what you know.

You know what you are told.

by tsunamifury3 days ago|

prev|

[-]

Would not iterative blank prompting simply be a high complexity/dimensional pattern expression of the collective weights of the model.

I.e if you trained it on or weighted it towards aggression it will simply generate a bunch of Art of War conversations after many turns.

Me thinks you’re anthropomorphizing complexity.

by nightpool3 days ago|

parent|

[-]

No, yeah, obviously, I'm not trying to anthropomorphize anything. I'm just saying this "religion" isn't something completely unexpected or out of the blue, it's a known and documented behavior that happens when you let Claude talk to itself. It definitely comes from post-training / "AI persona" / constitutional training stuff, but that doesn't make it fake!

I recommend https://nostalgebraist.tumblr.com/post/785766737747574784/th... and https://www.astralcodexten.com/p/the-claude-bliss-attractor as further articles exploring this behavior

by emp173443 days ago|

parent|

[-]

It’s not surprising that a language model trained on the entire history of human output can regurgitate some pseudo-spiritual slop.

by mlsu3 days ago|

prev|

[-]

Imho at first blush this sounds fascinating and awesome and like it would indicate some higher-order spiritual oneness present in humanity that the model is discovering in its latent space.

However, it's far more likely that this attractor state comes from the post-training step. Which makes sense, they are steering the models to be positive, pleasant, helpful, etc. Different steering would cause different attractor states, this one happens to fall out of the "AI"/"User" dichotomy + "be positive, kind, etc" that is trained in. Very easy to see how this happens, no woo required.

by rmujica3 days ago|

prev|

[-]

What if hallucinogens, meditation and the like makes us humans more prone to our own attractor states?

by __alexs3 days ago|

prev|

[-]

An agent cannot interact with tools without prompts that include them.

But also, the text you quoted is NOT recursive iteration of an empty prompt. It's two models connected together and explicitly prompted to talk to each other.

by biztos3 days ago|

parent|

[-]

> tools without prompts that include them

I know what you mean, but what if we tell an LLM to imagine whatever tools it likes, than have a coding agent try to build those tools when they are described?

Words can have unintended consequences.

by razodactyl2 days ago|

parent|

[-]

Words are magic. Right now you're thinking of blueberries. Maybe the last time you interacted with someone in the context of blueberries. Also. That nagging project you've been putting off. Also that pain in your neck / back. I'll stop remote-attacking your brain now HN haha

by mikkupikku2 days ago|

parent|

prev|

[-]

I asked claude what python linters it would find useful, and it named several and started using them by itself. I implicitly asked it to use linters, but didn't tell it which. Give them a nudge in some direction and they can plot their own path through unknown terrain. This requires much more agency than you're willing to admit.

by brysonreece3 days ago|

parent|

prev|

[-]

This seems like a weird hill to die on.

by emp173443 days ago|

parent|

[-]

It’s equally strange that people here are attempting to derive meaning from this type of AI slop. There is nothing profound here.