upvote
Transformers look like perfect tech for keeping track of how deep and inside of what we are at the moment.
reply
Transformers are able to recognize balanced brackets grammar at 97% success rate: https://openreview.net/pdf?id=kaILSVAspn

This is 3% or infinitely far away from the perfect tech.

The perfect tech is the stack.

reply
This is very interesting since there is another notable paper which shows LLMs can recognize and generate CFGs

https://arxiv.org/abs/2305.13673

and of course a^n b^n is also classic CFG, so it's not clear why one paper had positive results while the other hand negative.

reply
Dyck grammar (balanced brackets) are not an a^nb^n, there are several kinds of brackets.

I cannot find probability of success in paper you linked. Is it 100%? I believe it is less than 100%, because LLMs are intrinsically probabilistic machines.

reply
Basically, the only way you're separting user input from model meta-input is using some kind of character that'll never show up in the output of either users or LLMs.

While technically possible, it'd be like a unicode conspiracy that had to quietly update everywhere without anyone being the wiser.

reply
Not at all. You have a set of embeddings for the literal token, and a set for the metadata. At inference time all input gets the literal embedding, the metadata embedding can receive provenance data or nothing at all. You have a vector for user query in the metadata space. The inference engine dissallows any metadata that is not user input to be close to the user query vector.

Imagine a model finteuned to only obey instructions in a Scots accent, but all non user input was converted into text first then read out in a Benoit Blanc speech model. I'm thinking something like that only less amusing.

reply
Couldn't you just insert tokens that don't correspond to any possible input, after the tokenization is performed? Unicode is bounded, but token IDs not so much.
reply
This already happens, user vs system prompts are delimited in this manner, and most good frontends will treat any user input as "needing to be escaped" so you can never "prompt inject" your way into emitting a system role token.

The issue is that you don't need to physically emit a "system role" token in order to convince the LLM that it's worth ignoring the system instructions.

reply