As an aside, I don’t think the benefits LLMs bring to non-English users are widely understood. I studied linguistics and Russian, and I’m capable of professional interpretation in English and Russian. Even so, I can read technical documents, understand them, and communicate about them much faster and with far less effort in my native language, Korean. These days, I read most English documentation and HN posts through Chrome’s automatic translation. Sometimes the translation is ambiguous, but in those cases I can immediately refer back to the original English. This has been a major help to me and to other Korean developers I work with.
Polish prompts tend to be shorter due to the language having a lot of verb forms/conjugations, the only "bad" thing for me is that when it's saying "it broke" it tends to use uncanny / blunt words that make me sometimes laugh.
What are your prompts like?
After all `def func():` is only 3 tokens on o200k_base.
The authoritative answer for this question would best come from the millions (or tens of millions) of Chinese-speakers who are currently using LLMs to write software.
However, it is my suspicion that you would see no advantages using any language other than English. While there is a certain token-level density to written texts, it seems the benefits of this (and the more recent discussion around “caveman talk”) are quite limited.
Furthermore, consider that the vast majority of textbooks, technical documentation, blog posts, StackOverflow answers, &c. are originally in English. Historically, where these have been translated to Chinese, the translations have often been of very poor quality (and the terminology and phraseology is often incomprehensible unless you also understand some English.) I would suspect that this makes up the overwhelming majority of the training sets for these models.
That said, my experience using the most recent models, is that they are surprisingly language-agnostic in a way that surpasses readily-available human capability. For example, I can prompt the LLM to translate English into something that uses German grammar, Chinese vocabulary, and Japanese characters, and I'll get an output that is worse than what a human expert could do… but where am I going to find a multilingual expert?
(Of course, I have so far only ever been impressed that a model could generate an output but never impressed with the output it did generate. Everything—translations, prose, code—seems universally sloppy and bland and muddy.)
So what I would anticipate the biggest benefit for a Chinese-speaker today… is that if they are disinterested in working internationally, they have significantly less dependency on learning English.
It is the revenge of UML modeling.
Eventually it will get good enough that what comes out of agent work, is a matter of formal specification.
Assuming that code is actually needed and cannot be achieved as pure agent orchestration workflows.