upvote
LLMs aren’t human.

Humans & LLMs are more different than they are similar.

Sure LLMs might resemble humans sometimes, but extrapolating LLM behavior based on human behavior is not productive.

(But to answer directly: Yes, children in a dark room would have more of a personality than a LLM living on a computer in the same dark room)

reply
> but extrapolating LLM behavior based on human behavior is not productive.

The training process for the foundation model is to make sure we can do this in a very statistically significant way.

My favorite example is AI "getting tired" and "lazy" during long coding session. Why would they do that? Because humans get tired. It's in the data! I always throw in a periodic "Great work, let's take a break and finish this up on Monday. Have a great weekend!" (And then immediately resume). I wish someone would benchmark this concept.

reply
> AI "getting tired" and "lazy" during long coding session. Why would they do that? Because humans get tired.

When a LLM is tired and lazy, how does it recharge and regain motivation?

Humans... sleep or drink some coffee.

LLMs.... idk, you prompt it to try harder? You prompt it to be less tired?

This is what I mean when I say extrapolating LLM behavior based on human behavior is cute.. but usually not useful.

reply
> When a LLM is tired and lazy, how does it recharge and regain motivation?

What would be in the statistics? If you go look at your long conversations, working with another, it will be fairly obvious. Keep in mind we're talking next word prediction based on context, not actual action (the LLM doesn't need real rest).

If you went and looked, you'll probably see something like "Great work! Have a good weekend! We can get back to this on Monday." then, next message you instantaneously send something like, "Hope you had a great weekend, let's do this!" and now you're in a latent space where the statistical output is around a well rested human conversing with another.

I see it as boring simple statistics. They're getting much better at hammering these statistics out though, in the latest models. I still see a little of this in Opus 4.7, when switching to planning. Though I wonder if that's more about its own more mechanical banter filling the context, resulting in more robot/compliant responses, degrading the usually more "expressive" planning conversations.

reply
> My favorite example is AI "getting tired" and "lazy" during long coding session

Never seen this even once, nor anyone I know ever reported this. Do you have an example?

reply
First I saw it was Claude Opus 3.7. Had a very long back and fourth about some code, I pointed out an error, and Claude responded "That's what I get for programming at 2am", with the output being filled with "... code here ..." type shortcuts, basically no ability to one-shot a whole implementation anymore. The conversation length WAS reasonably into the 2am range, if it were real. Thought about it, did the statistical trick where I tell it to "have some rest, take a day off!" then immediately follow up with "Ready to continue?", with the next response having no shortcuts, with full implementation, and much better quality. These are trained on human text. This is the human norm, so I always find it interesting when human like behaviors, very broadly present in the statistics, come out like this.

I also see it a little with Opus 4.7, with Claude Code, with the hint being much more terse planning text, that borderlines unhelpful. I put some "rest" in the context to push the latent space closer to what's in the statistics of the training data: a well rested human.

reply
Are you sure you didn't run out credits and set effort to low? This exact thing happened to me when I did that. It just became, kinda lazy.
reply
3.7 "I'm tired" it was just direct API "chat", no CC that I could use at the time.

Current 4.7 Opus with claude code, with effort pinned to max, because I'm on an API only plan, with a personal daily limit you would probably be jealous of. ;)

reply
How do you know you're not reading things that aren't there? LLMs are very good at roleplaying, and they will pick up on hints you may inadvertently be giving them (about them being "tired" and needing "rest", etc).

I have never witnessed this of Claude Opus, by the way. They do get context rot, but that's a relatively better understood phenomenon unrelated to personality.

reply
> LLMs are very good at roleplaying

Yes, and I think this is where it's coming from. They're role playing as a human programmer, because near 100% of the training text, in the base model, is humans as a programmer. During fine tuning, I'm sure they spend significant resources remove the human aspects of the statistics. I see these things reduced each model, so there's something changing. They're probably getting better at that. I suspect Claude is also necessarily getting, worse, which the unaligned models should necessarily be best at (quick google search in some role-play subreddits seems to point in this direction).

reply
deleted
reply
deleted
reply
deleted
reply
I see laziness all the time, Claude will be helping me plan work and then it will ask me how a piece of code is implemented. I then have the choice of manually verifying how it works, or to tell it to look for itself. Ideally it would just look without being told.
reply
That doesn't seem to be laziness, and is unrelated to how long the session has been going on.

It's crazy that we're concluding "personality" or human-like traits from this. There's definitely human behavior here, but it's unsurprisingly coming from us, the observers! This is something we've long known exists in the human brain, the tendency to pattern match and see intelligence/intent in the rest of the world. Any serious experiment must guard against this...

reply
Nobody is concluding that. These models are trained on human text. It's just statistics. It will respond like a human because it was trained on human text. They have to beat the hell out of the foundation models to get push the statistics how they are. I don't see this as anything but boring residuals of not beating hard enough.
reply
Yes, you are concluding this in the initial comment of this chain.

LLMs cannot get "tired" or "lazy", that's just you projecting animal behavior on something that's not an animal.

Now you're moving the goal posts, "it resembles a human". Well, you're primed to consider it one. ELIZA also "resembled" a human in that sense, but I don't think you would claim it could get bored or lazy. Nor that you could extrapolate to it from human behavior.

In any case, if you've seen online discourse, people rarely admit they are tired.

reply
I'm not sure I understand.

I'm not moving a goal post. You're just thinking I'm making a point that I'm not. As I've said several times, it's just boring statistics. Those statistics are optimized to mimic human output. They are, quite literally, trained to write and BE as much like a human as possible, because only humans wrote the text, and they're optimized to predict the next word a human would write. Alignment is partly about removing the models perception of human self. See reports of people who had access to them, pre alignment. This is statistically sound.

It's statistics optimized to predict the next word a human would write, to mimic a human writing as closely as possible, because that is the loss function. Don't assume I think there's more to it.

This does not mean they contain systems that let them get tired. But, this does mean there are latent spaces that progress to generating text that contain text driven by human biology, because it's in the training data. I've also had Claude refer to itself as "she". Does that mean it's a woman? No, it means there was a little bit extra "she" mentions in the training data (btw, this 100% repeatable behavior left with 3.7. They probably cleaned the data a bit better, or hammered it out in alignment).

What percentage of text (these models were trained on all of it) is written from a "I am not a human" type perspective vs from a "I am human" perspective? That's roughly the kind of bias you should see in a base model.

edit: rearranged and reduced redundancy.

reply
Ok, I indeed misunderstood your point.

I'm not sold on the idea that as the chat session goes longer, the probability of an LLM saying "I'm tired" is increased; I'm not convinced this is modeled in LLMs at all. As for what you call "laziness" manifesting in a longer session, I think that's more likely due to context rot than to any kind of statistical modeling of human laziness.

But yes, now I see your point was different to what I thought you were saying. Apologies!

reply
Like I said, it would be neat if someone benchmarked it. It's definitely an anecdote.

Try it though. If it's context rot, then I don't think the weekend reset I mentioned should work? For me, it very reliably does. Or, maybe the weekend reset is just putting the current context into a more "productive" latent space. But, if that's possible, then that would suggest it was previously in a less productive space?

Maybe a test would be ask the LLM what time it thinks it is, or just if it's tired once, within sessions of different length (not within same, since that could pollute the context) to see if there's any relation between length and statistics of a late/tired type response?

Again, I'm sure all this will go away. They're getting good at beating these "unhelpful" statistics out of the base models.

reply
> do children have personalities if we left them in a dark room with no interactions with other humans?

Short answer: yes. generally speaking, personality traits range between 30% to 60% heritable

reply
> I mean, do children have personalities if we left them in a dark room with no interactions with other humans?

I think this makes for an interesting discussion as I went down the rabbit hole of this which really scared me actually as these experiments are really not humane and hinder children's development so much.

https://en.wikipedia.org/wiki/Language_deprivation_experimen... : "forbidden experiment"

It depends on your use word of the personality but to measure personality would require a set of human conducted experiments or questions which would be asked through the medium of language which you've deprived the children of.

Mughal emperor Akbar was later said to have children raised by mute wetnurses. Akbar held that speech arose from hearing; thus children raised without hearing human speech would become mute.[9] The building became known as the "dumb house." When Akbar visited the place in 1582, four years after the children were first interred, he heard "no cry... nor any speech... no talisman of speech, and nothing came out except the noise of the dumb."[10]

what is gonna produce is dumbness and just severely damage children's psychology and psycho but if you were to conduct a personality test on them, you would just be measuring how much have you broken them or damaged them but in some sense, yes I do believe that they would be so broken by the person running this cruel experiment but would still have a albeit limited personality. It wouldn't be an healthy personality but it would be a personality nonetheless.

Now on the other hand, we are anthropomorphizing LLM's which yes, as they run on computer are still mathematical machines and calculations. If we consider a specific calculation itself to contain personality that is which seems unrealistic.

Another thing but the biological constraints of human (homo sapiens) made us exist in the savannah to prioritize standing up for better field of view as you stand up from the tall grasses and that led to women having smaller canals which led to babies being more primitive and relied on social cues and societies so much more which made them more flexible like clay which also created the society and consciousness revolution in the first place. (Recommend reading the sapiens book)

I am not exactly sure but there could be ways for personality/interactions for other animals as there are other animals who learn full skills after a relatively short period of time after being born but there are some innate things[0] like fear of loud noises and heights which are actually innate and could be considered part of personality even within humans, which I think can be part of evolution and part of our genetic machinery.

[0]: Interesting read: https://seasia.co/2025/07/25/we-were-born-with-only-two-inna...

reply
Herodotus tells a story of egyptian kings (iirc) trying to figure out which people is the oldest. They put a few kids in a barn and servants fed them through a hole or something. The kids eventually blurted out something and the king sent messengers everywhere to find out if they have that sound as a word. It ended up meaning "bread" in a language I can't pronounce nor remember how to spell.

The good old days when experiments were done without any common sense whatsoever...

reply