Are there technical reasons why you can't make the "source" of the token (system prompt, user prompt, model thinking output, model response output, tool call, tool result, etc) a part of the feature vector - or even treat it as a different "modality"?
Or is this already being done in larger models?
The vast majority of this training data is generated synthetically.
each by itself, they with both interactions.
2!