upvote

  > The model cares about what you're saying, not what language you're saying it in.
What is the number of languages model is trained upon? And what is the number of training set sentences? I believe that these numbers are vastly different and cosine similarity is overwhelmingly biased by number of sentences.

What if we equalize number of languages and number of sentences in the training set? A galaxy-wise LLM, so to say.

Also, model can't help but care about language because your work shows divergence of cosine similarity at the decoding (output) stage(s).

reply
I'm trying to understand what you said, can you please correct me if I'm wrong here.

Would this be sort of like saying the way embeddings of different primitives across languages end up distributed in a vector space all follow the same principles and "laws"?

For example, if I train a large corpus of english and, separately, a large corpus of spanish, in both cases the way language constructs that are equivalent across both will end up represented using the same vector space patterns?

reply
This does seem to happen, at least close enough that it's possible to align embedding spaces across languages and do some translation without training on parallel texts.
reply
A fun thing to do is convince a model to fluidly switch between character sets to express ideas as 'efficiently' as possible. It likes to use Chinese hanzi a lot for abstract concepts. I've also seen Gemini use them unprompted in the middle of an English sentence.
reply
AIs code switching between human languages is cyberpunk AF.
reply
Extrapolating the benchmarks, this would imply the best RYS 27B is capable of out performing the 397B MoE?
reply