The real meaning of accountability is that you can fire one if you don't like how they work. Good news! You can fire an AI too.
It's similarly reasonable to drop a tool that's unreliable, though I don't think that's a reasonable description here. Instead, they used a tool which is generally known to be unpredictable and failed to sandbox it adequately.
The cold hard fact is: LLMs are an unreliable tool, and using them without checking their every action is extremely foolish.
You mean checking every action of theirs outside the sandbox I suppose? Otherwise any attempt at letting an agent do some work I would consider foolish.
At least for now.
No, we are not born with all the pre-training we need. That is rather the point of education, teaching people's brains how to process information in new, maybe unintuitive ways.
And in the reverse, if a person makes a series of impulsive, damaging decisions, they probably will not be able to accurately explain why they did it, because neither the brain nor physiology are tuned to permit it.
Seems pretty much the same to me.
What do you mean by fire? And how is the accountability similar to an employee?
There is no internal monologue with which to have introspection (beyond what the AI companies choose to hide as a matter of UX or what have you). There is no "I was feeling upset when I said/did that" unless it's in the context.
There is no ghost in the machine that we cannot see before asking.
Even if a model is able to come up with a narrative, it's simply that. Looking at the log and telling you a story.
Maybe. How do you tell? What would you expect to be different if they didn't?
> The LLM literally cannot possibly have a deeper insight into the root cause than the user, because it can only work from the information that the user has access to.
Insight is not solely a function of available input information. Arguably being able to search and extract the relevant parts is a far more important part of having insights.
I think you're asking how I would know if other people were P-zombies. That's an inappropriate question because I didn't talk about subjective experience, just about internal state. There's no question about whether other people have internal states. I can show someone a piece of information in such a way that only they see it and then ask them to prove that they know it such that I can be certain to an arbitrarily high degree that their report is correct.
Unvoiced thoughts are trickier to prove, but quite often they leave their mark in the person's voiced thoughts.
>Insight is not solely a function of available input information. Arguably being able to search and extract the relevant parts is a far more important part of having insights.
LLMs are notoriously bad at judging relevance. I've noticed quite often if you ask a somewhat vague question they try to cold-read you by throwing various guesses to see which one you latch onto. They're very bad at interpreting novel metaphors, for example.
In fact, talking about "thinking" at all is already the wrong direction to go down when trying to triage an incident like this. "Do not anthropomorphize the lawnmower" applies to AI as much as Larry Ellison.
If thinking is the wrong direction to go down, then it is also the wrong direction to go down when talking about humans.
Sometimes I think we're too eager to compare ourselves to them.
But are their explanations for how they behaved any more compelling than those of people who have? If so, why?
LLMs are lacking layers of awareness that humans have. I wonder if achieving comparable awareness in LLMs would require significantly more compute, and/or would significantly slow them down.
I argue that the model has no access to its thoughts at the time.
Split brain experiments notwithstanding I believe that I can remember what my faulty assumptions were when I did something.
If you ask a model “why did you do that” it is literally not the same “brain instance” anymore and it can only create reasons retroactively based on whatever context it recorded (chain of thought for example).
You got the wrong takeaway from your link.
This is falsified by that study, showing that on the frontier models generalized introspection does exist. It isn't consistent, but is is provable.
"no access" vs. "limited access"
You cannot trust that the model has introspection so for all intents and purposes for the end user it doesn't.
I suspect you’re making assumptions that don’t hold up to scrutiny.
You appear to be defaulting to the assumption that LLMs and humans have comparable thought processes. I don't think it's on me to provide evidence to the contrary but rather on you to provide evidence for such a seemingly extraordinary position.
For an example of a difference, consider that inserting arbitrary placeholder tokens into the output stream improves the quality of the final result. I don't know about you but if I simply repeat "banana banana banana" to myself my output quality doesn't magically increase.
It is known that the narrative part of the brain is separate from the decision taking brain. If someone asks you, in a very convincing, persuasive way, why you did something a year ago and you can't clearly remember you did, it can happen that you become positive that you did so anyway. And then the mind just hallucinates a reason. That's a trait of brains.
Yes brains can hallucinate reasons, doesn't mean they always do. If all reasons given were hallucinations then introspection would be impossible, but clearly introspection do help people.
There is no misinformation in what I wrote.