upvote
> What was being discussed wasn't some specific molecular process, it was the false premise "oh molecules move around randomly so your ceiling might just collapse of its own accord because the beam decided to randomly disintegrate". That's not something that happens.

Except it does happens. That’s why buildings get condemned and buildings eventually turn to rubble.

To the exact point; I have a product from a couple years ago using an old model from OpenAI. It’s still running and all it does is write a personality report based on scores from the test. I can’t update the model without seriously rewriting the entire prompt system, but the model has degraded over the years as well. Ergo, my product has degraded of its own accord and there is nearly nothing I can do about it. My only choice is to basically finagle newer models into giving the correct output; but they hallucinate at much higher rates than older models.

reply
> I could say something rude here about both mistakes being made by the same person, but since you brought it up I won't.

I'd encourage to desist from rudeness, not just when people point it out to you, but at all times.

> You said "The sequence of tokens that would destroy your production environment can be produced by your agent, no matter how much prompting you use". This is analogous to "the ceiling could just collapse on you due to random molecular motion, no matter how much maintenance you do or what materials you use".

If prompt engineering is effective (analogous to performing the necessary maintenance and selecting the correct materials), I'm curious what your explanation is for the incident in the article?

reply
> I'd encourage to desist from rudeness, not just when people point it out to you, but at all times.

I desire neither to be inauthentic, nor to suppress my emotions.

> If prompt engineering is effective (analogous to performing the necessary maintenance and selecting the correct materials), I'm curious what your explanation is for the incident in the article?

Keeping with the analogies, the original article doesn't say whether they built the roof properly or if the just used some screws to hold up a piece of quarter inch plywood and called it a day.

It's no surprise that a terribly built roof may fall down. It's possible to get shoddy materials from a supplier without knowing.

Calling a curl command isn't something that would be within the model's training as "this deletes things don't do it". The fact that this happened is not, to me, evidence that the model might have equally run `sudo rm -rf --no-preserve-root /` under similar circumstances.

It sounds like the phrase "NEVER FUCKING GUESS!" was in the prompt as well, which could easily encourage the model towards "be sure of yourself, take action" instead of the "verify" that was meant.

As mentioned elsewhere in this thread, the fact that the article focuses so strongly on "the model confessed! It admitted it did the wrong thing!" doesn't lead me to put a ton of stock into the capability of the author to be cautious.

reply
[dead]
reply
[dead]
reply