upvote
"Just don't accidentally forget to do the thing that makes it safe" is not a very effective strategy for something that so many vested interests are trying to push into all corners of society. If it's so easy to misuse it, then it shouldn't be used in any context outside of where there are no major consequences for bad output and there's amble opportunity and ability to validate it
reply
Not really. They're still non deterministic language predictors. Believing that a prompt is an effective way to actually control these machines' actual behavior is really far fetched.

They com like that from factory. Hardcoded to never say no.

reply
They're not hardcoded to never say no, but some of the models were trained to be "yes men" because their creators thought it would be a good property to have. GPT-4o for example.
reply
> non deterministic language predictors.

Non?? Only those with sh*tty code, surely.

There's nothing inherently non-deterministic about inference.

reply
The thing is that they are completely incapable of meta-cognition. Reasoning models don’t show their actual reasoning at all.
reply
Right — they're not reasoning, they're generating text that statistically models reasoning. Anyone who says differently is selling something.
reply
As the meme goes, "they are the same picture".
reply
Language has reasoning encoded within it.
reply
It certainly does. But so too do complex neural network functions, as do attention mechanisms.
reply
That is what a base model does. After RL it is a very different thing, and anyone who says they know what it is, is naive or dishonest. These things are grown, not made, and we really do not understand how they work in many important ways.
reply
Yeah, but they’re not magic; we can still do experiments and see what happens. Anthropic did a lot of work on this and showed that they’re not accurately describing their reasoning process.
reply
Of course, the fact that they have to do that proves my point.
reply
Not believing that a prompt is an effective way to actually control their behavior is obviously incorrect to anyone who's actually used these things.

It's not a guaranteed way to control their behavior, but you can more than move the needle.

reply
The word most relevant to this conversation is “influence.” Influence is possible and users observe it and use it to increase margins of useful outcomes. “Control” is incorrect.
reply
yeah that distinction is pretty important, and in general that guy I believe IS making the point - if you can not control it with guaranteed outcomes - you cannot control it.
reply
You can't control it any more than you can control a draw from a deck of cards, but you can absolutely control the deck of cards that you choose to draw from.
reply
The problem is that nobody really does that? Like, as far as I'm aware, even simple stuff such as not considering tokens that would result in a syntax error when writing code isn't being done.
reply
magicians can probably make you change your mind on the former
reply
That's silly. My car is not absolutely guaranteed to turn left when I turn the steering wheel left, but you wouldn't say I can't control my car on that basis.

Steering an LLM with a prompt is way less reliable than steering a car with a steering wheel, but there's still control. It's just not absolute.

reply
if your car doesn' turn left when you turn the steering wheel left, the problem is that the car is broken, if an LLM does something unexpected after you gave it instructions, that's possible when the LLM is functioning entirely correctly.
reply
Nothing in this world is guaranteed. That doesn't mean it's uniformly random either. LLMs can still do something unexpected if you give them clear instructions, but that doesn't mean it'll be arbitrary and unpredictable in scope. The same way C/C++ undefined behavior technically means program can give you nasal demons, but in reality it won't do anything unusual (like format your C:/ drive) unless someone purposefully coded it to do that.
reply
This is all going to flash through your mind when your car mysteriously doesn't turn left. I would prefer to think of machines as things with defined outputs and failure is failure, more than as fluffy little kittens who might do the wrong thing, if the consequences are going to fall on someone who doesn't deserve it.
reply
[dead]
reply