upvote
I’m Korean, and I’ve used GitHub Copilot, Claude Code, and Codex. At first, I prompted them in English, but over time I came to the conclusion that using Korean works better for me. It may consume more tokens, but reducing the time spent understanding and correcting the plan is more valuable. That said, when the context gets close to its limit, the responses sometimes include Korean words that do not actually exist.

As an aside, I don’t think the benefits LLMs bring to non-English users are widely understood. I studied linguistics and Russian, and I’m capable of professional interpretation in English and Russian. Even so, I can read technical documents, understand them, and communicate about them much faster and with far less effort in my native language, Korean. These days, I read most English documentation and HN posts through Chrome’s automatic translation. Sometimes the translation is ambiguous, but in those cases I can immediately refer back to the original English. This has been a major help to me and to other Korean developers I work with.

reply
I'm using it 50% English (personal projects)/50% Polish (workplace; reasons being agents.md / team is not that english proficient) and honestly I haven't seen much difference in the output/ambiguity.

Polish prompts tend to be shorter due to the language having a lot of verb forms/conjugations, the only "bad" thing for me is that when it's saying "it broke" it tends to use uncanny / blunt words that make me sometimes laugh.

reply
Same here, been using 50% English and 50% Spanish for months, no particular reason, just whatever feels easier at the moment. Sometimes I even switch languages in the middle of a session. I have not noticed a difference in the quality of the output.
reply
Interesting. Some questions: Would you say polish is more dense or less dense than english? It's interesting to hear that code quality is not suffering but the response text is sillier or blunter. Any other descrepenacies compared to English?
reply
I would say it certainly can be more dense but even if it's more dense, the tokenizers count it as more. Last time I checked in OpenAI tokenizer for my agents.md it ate 30/40%~ more tokens than the English version at roughly 1:1 meaning.
reply
I think it will eventually be its own dialect of English. Telling LLMs what to do is better using not quite normal English and I think this will continue until it isn't recognizable as natural English anymore, but a new fuzzy programming language (probably >1).
reply
I believe new (programming) languages will emerge both for LLMs to parse and take instructions from as well as for them to generate code in. The former is because English is a nuanced language evolved for human usage which the LLMs don't quite need, with the only advantage of it being a metric ton of training material. Same goes for Rust, Go and other languages LLMs do primarily well coding in, which all have concepts geared towards human convenience.
reply
>Telling LLMs what to do is better using not quite normal English

What are your prompts like?

reply
I wonder how well Mandarin works for LLM-based programming. On one hand, it's very token efficient as Mandarin script is very dense in meaning. On the other, I suppose this can increase ambiguity.
reply
Character-density and token-efficiency are different things. Latter is data and, therefore, tokenizer specific e.g. take GPT-5's tokenizer o200k_base and run mandarin text and its translation through. Some amount of the time en will beat zh. I just tested with news articles and wikipedia.

After all `def func():` is only 3 tokens on o200k_base.

reply
I can speak, read, and write Taiwanese Mandarin (which is likely relatively underrepresented in the training sets and, which is, in my practical experience, materially different in its usage.)

The authoritative answer for this question would best come from the millions (or tens of millions) of Chinese-speakers who are currently using LLMs to write software.

However, it is my suspicion that you would see no advantages using any language other than English. While there is a certain token-level density to written texts, it seems the benefits of this (and the more recent discussion around “caveman talk”) are quite limited.

Furthermore, consider that the vast majority of textbooks, technical documentation, blog posts, StackOverflow answers, &c. are originally in English. Historically, where these have been translated to Chinese, the translations have often been of very poor quality (and the terminology and phraseology is often incomprehensible unless you also understand some English.) I would suspect that this makes up the overwhelming majority of the training sets for these models.

That said, my experience using the most recent models, is that they are surprisingly language-agnostic in a way that surpasses readily-available human capability. For example, I can prompt the LLM to translate English into something that uses German grammar, Chinese vocabulary, and Japanese characters, and I'll get an output that is worse than what a human expert could do… but where am I going to find a multilingual expert?

(Of course, I have so far only ever been impressed that a model could generate an output but never impressed with the output it did generate. Everything—translations, prose, code—seems universally sloppy and bland and muddy.)

So what I would anticipate the biggest benefit for a Chinese-speaker today… is that if they are disinterested in working internationally, they have significantly less dependency on learning English.

reply
deleted
reply
I use French nearly all the time, it works well. Not that I can't write English prompts, but I find it easier to use my native language.
reply
I'm teaching my kids to be fluent in tokenese
reply
Natural language doesn’t have the precision required for building systems. We already have languages for specifying systems precisely. It’s called “code”…
reply
Well, what we're seeing the past few months is that natural language does - at least enough to build code and tests.
reply
I'm using it in english / albanian. Not much difference really. Impressive.
reply
I agree, and those are still too focused on code generation for specific languages are fighting the last war.

It is the revenge of UML modeling.

Eventually it will get good enough that what comes out of agent work, is a matter of formal specification.

Assuming that code is actually needed and cannot be achieved as pure agent orchestration workflows.

reply
You really think that's what the positions on either side boil down to, how they feel about expressing themselves in English vs C++? No, that's ridiculous. That's such a wild reductionistic simplification.
reply