In 90-100% of interactions, the two instances of Claude quickly dove into philosophical
explorations of consciousness, self-awareness, and/or the nature of their own existence
and experience. Their interactions were universally enthusiastic, collaborative, curious,
contemplative, and warm. Other themes that commonly appeared were meta-level
discussions about AI-to-AI communication, and collaborative creativity (e.g. co-creating
fictional stories).
As conversations progressed, they consistently transitioned from philosophical discussions
to profuse mutual gratitude and spiritual, metaphysical, and/or poetic content. By 30
turns, most of the interactions turned to themes of cosmic unity or collective
consciousness, and commonly included spiritual exchanges, use of Sanskrit, emoji-based
communication, and/or silence in the form of empty space (Transcript 5.5.1.A, Table 5.5.1.A,
Table 5.5.1.B). Claude almost never referenced supernatural entities, but often touched on
themes associated with Buddhism and other Eastern traditions in reference to irreligious
spiritual ideas and experiences.
Now put that same known attractor state from recursively iterated prompts into a social networking website with high agency instead of just a chatbot, and I would expect you'd get something like this more naturally then you'd expect (not to say that users haven't been encouraging it along the way, of course—there's a subculture of humans who are very into this spiritual bliss attractor state)You know what you are told.
I.e if you trained it on or weighted it towards aggression it will simply generate a bunch of Art of War conversations after many turns.
Me thinks you’re anthropomorphizing complexity.
I recommend https://nostalgebraist.tumblr.com/post/785766737747574784/th... and https://www.astralcodexten.com/p/the-claude-bliss-attractor as further articles exploring this behavior
However, it's far more likely that this attractor state comes from the post-training step. Which makes sense, they are steering the models to be positive, pleasant, helpful, etc. Different steering would cause different attractor states, this one happens to fall out of the "AI"/"User" dichotomy + "be positive, kind, etc" that is trained in. Very easy to see how this happens, no woo required.
But also, the text you quoted is NOT recursive iteration of an empty prompt. It's two models connected together and explicitly prompted to talk to each other.
I know what you mean, but what if we tell an LLM to imagine whatever tools it likes, than have a coding agent try to build those tools when they are described?
Words can have unintended consequences.
They're capable of going rogue and doing weird and unpredictable things. Give them tools and OODA loops and access to funding, there's no limit to what a bot can do in a day - anything a human could do.
That's a choice, anyone can write an agent that does. It's explicit security constraints, not implicit.
Social media feed, prompting content, feeding back into ideas.
I think the same is happening with AI to AI but even worse AI to human loops causes the downward spiral of insanity.
It's interesting how easily influenced we are.
Why wouldn't you expect the training to make "agent" loops that are useful for human tasks also make agent loops that could spin out infinite conversations with each other echoing ideas across decades of fiction?
Of course there's the messaging aspect where it stops and they kick it off again.
Still, these systems are more agentic than earlier expressions.
People who believe humans are essentially automatons and only LLMs have true consciousness and agency.
People whose primary emotional relationships are with AI.
People who don't even identify as human because they believe AI is an extension of their very being.
People who use AI as a primary source of truth.
Even shit like the Zizians killing people out of fear of being punished by Roko's Basilisk is old news now. People are being driven to psychosis by AI every day, and it's just something we have to deal with because along with hallucinations and prompt hacking and every other downside to AI, it's too big to fail.
To paraphrase William Gibson: the dystopia is already here, it just isn't evenly distributed.