upvote
For a single word response, perhaps.

But for anything else I wouldn't.

The entire chain will be affected from the different tokenization on down. Even if it lands in roughly the same semantic area, it doesn't mean it will land there with anything like the same syntactic selections. Anywhere there were multiple near-tokens could easily select a different route based on even minor fluctuations in the starting conditions. It's chaotic.

reply
I don't know about single letters, but single words?

"Score this resumé. Applicant: Jim ..."

"Score this resumé. Applicant: Greg..."

Is it obvious to anyone that these will have the same modal response?

reply
I believe there's some data that they will have different responses if the names signify different cultural / race / gender affiliations. Here be dragons.
reply
"Your are a helpful/less assistant"

Give it a try. 4 letter difference. Add a few 100 tokens describing the task, such that the change becomes a tiny fraction of the input.

Discontinuities everywhere.

reply
But those are VERY different types of assistant. It is correct behavior that you would get different outputs in this case.
reply