by accuracy I meant how close is the output to your expectations, for example if you ask 8B model to write C compiler in C, it outputs theory of how to write compiler and writes pseudocode in Python. Which is off by 2 measures: (1) I haven't asked for theory (2) I haven't asked to write it in Python.
Or if you want to put it differently, if your prompt is super clear about the actions you want it to do, is it following it exactly as you said or going off the rails occasionally