Just before that, at work, I found a bug in an AI driven refactor of code. For some reason, both the original refactor and the ai driven autocomplete kept trying to send the wrong parameters to a function. It was determined to get it wrong, even after I manually fixed it. [Edit - I should also mention the AI driven code review agent tried to do the same thing. The clankers are consistent.]
This is why familiar language matters. Because at some point, you'll have bugs that the AI can't fix. And by the way, I use LLM tools at work and have a set of skills that improve my productivity, if not my QoL. But I still need to be able to dive into the language, the build tools, and fix things.
Stability, consistency and simplicity are much more important than this notion of familiarity (there's lots of code to train on) as long as the corpus is sufficiently large. Another important one is how clear and accessible libraries, especially standard libraries, are.
Take Zig for example. Very explicit and clear language, easy access to the std lib. For a young language it is consistent in its style. An agent can write reasonable Zig code and debug issues from tests. However, it is still unstable and APIs change, so LLMs get regularly confused.
Languages and ecosystems that are more mature and take stability very seriously, like Go or Clojure, don't have the problem of "LLM hallucinates APIs" nearly as much.
The thing with Clojure is also that it's a very expressive and very dynamic language. You can hook up an agent into the REPL and it can very quickly validate or explore things. With most other languages it needs to change a file (which are multiple, more complex operations), then write an explicit test, then run that test to get the same result as "defn this function and run some invocations".
Counterexample: the Wolfram programming language (by many people rather known from the Mathematica computer algebra system).
It is incredibly mature and takes stability very seriously, but in my experience LLMs tend to hallucinate a lot when you ask them to write Wolfram or Mathematica code.
I see the reason in two points:
1. There exists less Wolfram/Mathematica code online than for many other popular programming languages.
2. Code in Wolfram is often very concise; thus it is less forgiving with respect to "somewhat correct" code (which is in my opinion mostly a good thing), thus LLM often tend to struggle writing Wolfram/Mathematica code.
A stable mature framework then is the best case scenario. New frameworks or rapidly changing frameworks will be difficult, wasting lots of tokens on discovery and corrections.
Up to a point, I guess? There must be a point of diminishing returns based on the expressiveness of the language
I mean, a language that has 8 different ways to declare + initialise composite variables needs to have a much larger training corpus than a language that has only 2 or 3 different ways.
The more expressive a language, the more different suitable patterns would be required, which results in a larger corpus being needed.