upvote
The Chinese alphabet is very much a dictionary. All the major tokenizers are far larger.
reply
That doesn’t make any sense. A alphabet is a list of valid characters. A dictionary is not just a list. Even in a language like Chinese where individual characters carry meaning, a dictionary tells you what that meaning is. It’s not just a list of characters.

Or to echo article, the dictionary is made out of weights.

reply
A list of words isn’t a dictionary. What a dictionary adds over a list of words is all the relationships between the words needed to interpret them and use them, and all of that is in the weights.
reply
We should tell the Unix people that they've been giving /usr/share/dict the wrong name for over three decades. (-:
reply
I mean, they did, and we have, and we've also stopped doing that.

https://en.wikipedia.org/wiki/Words_(Unix)

reply
A mapping of Chinese characters to integers (like a tokenizer) would not be a dictionary. You’d also need definitions. At best it’s an index to a hypothetical dictionary.
reply
It's beside the point and so I only note it out of interest, but the Chinese writing system doesn't use an alphabet (or a syllabary like Japanese kana), it's logography.
reply