undefined

points

[-]

> If feedback from this incident is in its context window, it is highly unlikely to make this same mistake again

If this incident gets into its training data, then its highly likely that it will repeat it again with the same confession since this is a text predictor not a thinker.

by foolswisdom12 hours ago|

prev|

[-]

Or not, because telling the agent is misbehaving may predispose it to misbehaving behavior, even though you point told it so to tell it to not behave that way.

I remember this discussed when a similar issue went viral with someone building a product using replit's AI and it deleted his prod database.

by themafia12 hours ago|

prev|

[-]

> Yes this is only probabilistic, but so is a human learning from mistakes.

Yet, since I'm also a Human being, and can work to understand the mistake myself, the probability that I can expect a correction of the behavior is much higher. I have found that it significantly helps if there's an actual reasonable paycheck on the line.

As opposed to the language model which demands that I drop more quarters into it's slots and then hope for the best. An arcade model of work if there ever was one. Who wants that?

by the_af10 hours ago|

prev|

[-]

> If feedback from this incident is in its context window, it is highly unlikely to make this same mistake again.

In my experience, this isn't true. At least with a version or so ago of ChatGPT, I could make it trip on custom word play games, and when called out, it would acknowledge the failure, explain how it failed to follow the rule of the game, then proceed to make the same mistake a couple of sentences later.