upvote
> Even if you instruct the model "don't do X" or "do X this way"—you cannot rely on the model following that instruction.

Why not? I can definitively fire of two prompts to the same model and harness, and one include "don't do X" and the other doesn't, and I get what I expect, one didn't try to avoid doing X, and the other did. Is that not your experience using LLMs?

reply
It depends on the instruction, and how many other instructions there are. Models converge on doing things the way that emerged from their training, and with every turn the model cares less and less about your instructions. In practice, this means that after you had the model plan and execute the plan, you almost always end up having to iterate on the output because during the process of outputting the output the model began to derail and ignore instructions. You get things like "In a real app, we would do X, for now, just return null" or various subtle bugs.

It makes sense if you remember that it just predicts, what should probably be the next piece of text?

reply
I understand how they work, as I do work with them everyday and been doing so for two years or so. What I don't understand, is how what you're saying is in any way related to the whole "deliberately create errors in code" part, which is where I jumped into the discussion.

Maybe I'm missing some bigger picture you're trying to paint here? I understand (and see) them making "mistakes" all the time, and I guess you could argue it's deliberate in some way, because it's simply how they work and adjusting the prompt and redoing usually solves the problem. But I'm afraid I don't see how it's connected, at least yet.

reply
Nope, no bigger picture. That's all I meant.
reply