In particular the whole stack based thing looks questionable.
In fact the very first answer by Gemini proposed an APL-like encoding of the primitives for token saving, but when I started the implementation Claude Code pushed back on that, saying it would need to keep some sane semantics around the keywords to be able to understand the programs.
The very strict verification story seems more plausible, tracks with the rest of the comments here.
What has surprised me is that the language works at all, adding todo items to a web app written in a week old language felt a bit eery.
Have the LLMs generate tests that measure the “ease of use” and “effectiveness” of coding agents using the language.
Then have them use these tests to get data for their language design process.
They should also smoke test their own “meta process” here. E.g. Write a toy language that should be obviously much worse for LLMs, and then verify that the effectiveness tests produce a result agreeing with that.
I await the blog post :)
I have programmed about 3 Forth implementations by hand throughout the years for fun, but I have never been able to really program in it, because the stack wrangling confuses me enormously.
So for me anything vaguely complex is unreadable , but apparently not for the LLMs, which I find surprising. When I have interrogated them they say they like the lack of syntax more than the stack ops hamper them, but it might be just an hallucinated impression.
When they write Cairn I sometimes see stack related error messages scroll by, but they always correct them quickly before they stop.