undefined

points

[-]

That's something I'm wondering as well. Not sure how it is with frontier models, but what you can see on Huggingface, the "standard" method to distinguish tokens still seems to be special delimiter tokens or even just formatting.

Are there technical reasons why you can't make the "source" of the token (system prompt, user prompt, model thinking output, model response output, tool call, tool result, etc) a part of the feature vector - or even treat it as a different "modality"?

Or is this already being done in larger models?

by jerf7 hours ago|

parent|

[-]

By the nature of the LLM architecture I think if you "colored" the input via tokens the model would about 85% "unlearn" the coloring anyhow. Which is to say, it's going to figure out that "test" in the two different colors is the same thing. It kind of has to, after all, you don't want to be talking about a "test" in your prompt and it be completely unable to connect that to the concept of "test" in its own replies. The coloring would end up as just another language in an already multi-language model. It might slightly help but I doubt it would be a solution to the problem. And possibly at an unacceptable loss of capability as it would burn some of its capacity on that "unlearning".

by oezi10 hours ago|

prev|

[-]

Instead of using just positional encodings, we absolutely should have speaker encodings added on top of tokens.

by easeout7 hours ago|

prev|

[-]

Because they're the main prompt injection vector, I think you'd want to distinguish tool results from user messages. By the time you go that far, you need colors for those two, plus system messages, plus thinking/responses. I have to think it's been tried and it just cost too much capability but it may be the best opportunity to improve at some point.

by jhrmnn10 hours ago|

prev|

[-]

Because then the training data would have to be coloured

by __alexs10 hours ago|

parent|

[-]

I think OpenAI and Anthropic probably have a lot of that lying around by now.

by jhrmnn9 hours ago|

parent|

[-]

So most training data would be grey and a little bit coloured? Ok, that sounds plausible. But then maybe they tried and the current models get it already right 99.99% of the time, so observing any improvement is very hard.

by nairboon9 hours ago|

parent|

prev|

[-]

They have a lot of data in the form: user input, LLM output. Then the model learns what the previous LLM models produced, with all their flaws. The core LLM premise is that it learns from all available human text.

by __alexs9 hours ago|

parent|

[-]

This hasn't been the full story for years now. All SOTA models are strongly post-trained with reinforcement learning to improve performance on specific problems and interaction patterns.

The vast majority of this training data is generated synthetically.

by layer89 hours ago|

prev|

[-]

This has the potential to improve things a lot, though there would still be a failure mode when the user quotes the model or the model (e.g. in thinking) quotes the user.

by efromvt10 hours ago|

prev|

[-]

I’ve been curious about this too - obvious performance overhead to have a internal/external channel but might make training away this class of problems easier

by cyanydeez11 hours ago|

prev|

[-]

you would have to train it three times for two colors.

each by itself, they with both interactions.

by __alexs11 hours ago|

parent|

[-]

The models are already massively over trained. Perhaps you could do something like initialise the 2 new token sets based on the shared data, then use existing chat logs to train it to understand the difference between input and output content? That's only a single extra phase.

by vanviegen11 hours ago|

parent|

prev|

[-]

You should be able to first train it on generic text once, then duplicate the input layer and fine-tune on conversation.