If there are millions of lines on github in your language.
Otherwise the 'teaching AI to write your language' part will occupy so much context and make it far less efficient that just using typescript.
The vast majority of tokens are not used for documentation or reference material but rather are for reasoning/thinking. Unless you somehow design a programming language that is just so drastically different than anything that currently exists, you can safely bet that LLMs will pick them up with relative ease.
That's assuming that your new, very unknown language gets slurped up in the next training session which seems unlikely. Couldn't you use RAG or have an LLM read the docs for your language?
There are languages that are already pretty sparse with keywords. e.g in Go you can write 'func main() string', no need to define that it's public, or static etc. So combining a less verbose language with 'codegolfing' the variables might be enough.
In go every third line is a noisy if err check.
Claude seems more consistently _concise_ to me, both in web and cli versions. But who knows, after 12 months of stuff it could be me who is hallucinating...
Programming languages function in large parts as inductive biases for humans. They expose certain domain symmetries and guide the programmer towards certain patterns. They do the same for LLMs, but with current AI tech, unless you're standing up your own RL pipeline, you're not going to be able to get it to grok your new language as well as an existing one. Your chances are better asking it to understand a library.
How will it "learn" anything if the only available training data is on a single website?
LLMs struggle with following instructions when their training set is massive. The idea that they will be able to produce working software from just a language spec and a few examples is delusional. It's a fundamental misunderstanding of how these tools work. They don't understand anything. They generate patterns based on probabilities and fine tuning. Without massive amounts of data to skew the output towards a potentially correct result they're not much more useful than a lookup table.
I'm using Claude Code to work on something involving a declarative UI DSL that wraps a very imperative API. Its first pass at adding a new component required imperative management of that component's state. Without that implementation in context, I told Claude the imperative pattern "sucks" and asked for an improvement just to see how far that would get me.
A human developer familiar with the codebase would easily understand the problem and add some basic state management to the DSL's support for that component. I won't pretend Claude understood, but it matched the pattern and generated the result I wanted.
This does suggest to me that a language spec and a handful of samples is enough to get it to produce useful results.
I have done exactly the above with great success. I work with a weird proprietary esolang sometimes that I like, and the only documentation - or code - that exists for it is on my computer. I load that documentation in, and it works just fine and writes pretty decent code in my esolang.
"But that can't possibly work [based on my misunderstanding of how LLMs work]!" you say.
Well, it does, so clearly you misunderstand how they work.
The impact that lack of training data has on the quality of the results is easily observable. Try getting them to maintain a Python codebase vs. e.g. an Elixir one. Not just generate short snippets of code, but actually assist in maintaining it. You'll constantly run into basic issues like invalid syntax, missing references, use of nonexistent APIs, etc., not to mention more functional problems like dead, useless, or unnecessarily complicated code. I run into these things with mainstream languages (Go, Python, Clojure), so I don't see how an esolang could possibly fair any better.
But then again, the definitions of "just fine" and "decent" are subjective, and these tools are inherently unreliable, which is where I suspect the large disconnect in our experiences comes from.
Probably if you’re trying to be esoteric and arcane then yeah, you might have trouble, but that’s not normally how languages evolve.