undefined

points

by vova_hn210 hours ago |

comments

by lanyard-textile7 hours ago|

[-]

You'd be surprised -- This could match on the model's training to proceed using a tool, for example.

by jerf9 hours ago|

prev|

[-]

There is a study that shows that what the model is doing behind the scenes in those cases is a lot more than just outputting those tokens.

For an LLM, tokens are thought. They have no ability to think, by whatever definition of that word you like, without outputting something. The token only represents a tiny fraction of the internal state changes made when a token is output.

Clearly there is an optimal for each task (not necessarily a global one) and a concrete model for a given task can be arbitrarily far from it. But you'd need to test it out for each case, not just assume that "less tokens = more better". You can be forcing your model to be dumber without realizing it if you're not testing.

by DonHopkins9 hours ago|

parent|

[-]

High dimensional vectors are thought (insofar as you can define what that even means). Tokens are one dimensional input that navigates the thought, and output that renders the thought. The "thinking" takes place in the high dimension space, not the one dimensional stream of tokens.

by gchamonlive8 hours ago|

parent|

[-]

But isn't the one dimensional tokens a reflex of high dimensional space? What you see is "sure let's take a look at that" but behind the curtains it's actually an indication that it's searching a very specific latent space which might be radically different if those tokens didn't exist. Or not. In any case, you can't just make that claim and isolate those two processes. They might be totally unrelated but they also might be tightly interconnected.

by sheiyei8 hours ago|

parent|

[-]

I assume in practice, filler words do nothing of value. When words add or mean nothing (their weights are basically 0 in relation to the subject), I don't see why they'd affect what the model outputs (except cause more filler words)?

by gchamonlive8 hours ago|

parent|

[-]

Politeness have impact (https://arxiv.org/abs/2402.14531) so I wouldn't be too fast to make any kind of claim with a technology we don't know exactly how it works.

by xgulfie8 hours ago|

parent|

prev|

[-]

[flagged]

by rokob7 hours ago|

parent|

[-]

[flagged]

by xgulfie7 hours ago|

parent|

[-]

[flagged]

by wzdd9 hours ago|

prev|

[-]

They carry information in regular human communication, so I'm genuinely curious why you'd think they would not when an LLM outputs them as part of the process of responding to a message.