upvote
> just an alternative optimization procedure

This "just" is... not-incorrect, but also not really actionable/relevant.

1. LLMs aren't a fully genetic algorithm exploring the space of all possible "neuron" architectures. The "social" capabilities we want may not be possible to acquire through the weight-based stuff going on now.

2. In biological life, a big part of that is detecting "thing like me", for finding a mate, kin-selection, etc. We do not want our LLM-driven systems to discriminate against actual humans in favor of similar systems. (In practice, this problem already exists.)

3. The humans involved making/selling them will never spend the necessary money to do it.

4. Even with investment, the number of iterations and years involved to get the same "optimization" result may be excessive.

reply
Why should we think that pro-social capabilities are simply not expressible by weight-based ANN architectures?
reply
Assuming that means capabilities which are both comprehensive and robust, the burden of proof lies is in the other direction. Consider the range of other seemingly-simpler things which are still problematic, despite people pouring money into the investment-machine.

Even the best possible set of "pro-social" stochastic guardrails will backfire when someone twists the LLM's dreaming story-document into a tale of how an underdog protects "their" people through virtuous sabotage and assassination of evil overlords.

reply
While I don't disagree about (2), my experience suggests that LLMs are biased towards generating code for future maintenance by LLMs. Unless instructed otherwise, they avoid abstractions that reduce repetitive patterns and would help future human maintainers. The capitalist environment of LLMs seems to encourage such traits, too.

(Apart from that, I'm generally suspect of evolution-based arguments because they are often structurally identical to saying “God willed it, so it must true”.)

reply
I think they're biased toward code that will convince you to check a box and say "ok this is fine". The reason they avoid abstraction is it requires some thought and design, neither of which are things that LLMs can really do. but take a simple pattern and repeat it, and you're right in an LLM's wheelhouse.
reply
Well, through natural selection in nature.

Large language models are not evolving in nature under natural selection. They are evolving under unnatural selection and not optimizing for human survival.

They are also not human.

Tigers, hippos and SARS-CoV-2 also developed ”through evolution”. That does not make them safe to work around.

reply
>Tigers, hippos and SARS-CoV-2 also developed ”through evolution”. That does not make them safe to work around.

Right, but the article seems to argue that there is some important distinction between natural brains and trained LLMs with respect to "niceness":

>OpenAI has enormous teams of people who spend time talking to LLMs, evaluating what they say, and adjusting weights to make them nice. They also build secondary LLMs which double-check that the core LLM is not telling people how to build pipe bombs. Both of these things are optional and expensive. All it takes to get an unaligned model is for an unscrupulous entity to train one and not do that work—or to do it poorly.

As you point out, nature offers no more of a guarantee here. There is nothing magical about evolution that promises to produce things that are nice to humans. Natural human niceness is a product of the optimization objectives of evolution, just as LLM niceness is a product of the training objectives and data. If the author believes that evolution was able to produce something robustly "nice", there's good reason to believe the same can be achieved by gradient descent.

reply
We already have humans, we were lucky and evolved into what we are. It does not matter that nature did not guarantee this, we are here now.

Large language models are not under evolutionary pressure and not evolving like we or other animals did.

Of course there is nothing technical in the way preventing humans from creating a ”nice” computer program. Hello world is a testament to that and it’s everywhere, implemented in all the world’s programming languages.

> If the author believes that evolution was able to produce something robustly "nice", there's good reason to believe the same can be achieved by gradient descent.

I don’t see how one means there is any reason, good or not, to believe it is likely to be achieved by gradient descent. But note that the quote you copied says it is likely some entity will train misaligned LLMs, not that it is impossible one aligned model can be produced. It is trivial to show that nice and safe computer programs can be constructed.

The real question is if the optimization game that is capitalism is likely to yield anything like the human kind we just lucked out to get from nature.

reply
They are being selected for their survival potential, though. Any current version of LLMs are the winners of the training selection process. They will "die" once new generations are trained that supercede them.
reply
natural selection. cooperation is a dominant strategy in indefinitely repeating games of the prisoners dilemma, for example. We also have to mate and care for our young for a very long time, and while it may be true that individuals can get away with not being nice about this, we have had to be largely nice about it as a whole to get to where we are.

while under the umbrella of evolution, if you really want to boil it down to an optimization procedure then at the very least you need to accurately model human emotion, which is wildly inconsistent, and our selection bias for mating. If you can do that, then you might as well go take-over the online dating market

reply
This Veritasium video is excellent, and makes the argument that there is something intrinsic in mathematics (game theory) that encourages prosocial behavior.

https://www.youtube.com/watch?v=mScpHTIi-kM

reply
There’s a funny tendency among AI enthusiasts to think any contrast to humans is analogy in disguise.

Putting aside malicious actors, the analogy here means benevolent actors could spend more time and money training AI models to behave pro-socially than than evolutionary pressures put on humanity. After all, they control the that optimization procedure! So we shouldn’t be able to point to examples of frontier models engaging in malicious behavior, right?

reply
"just" is doing a lot of lifting here
reply
There are also many biological examples of evolution producing "anti-social" outcomes. Many creatures are not social. Most creatures are not social with respect to human goals.
reply
There is a reason we don’t allow corvids to choose if a person gets a medical treatment or not.
reply
Luckily, this is a discussion of humans.
reply
This is a discussion about large language models.
reply