upvote
> Training data can't be the whole answer.

Absolutely correct. Anthropic showed that 250 examples can "poison" an LLM -- independent of LLM activation count.

reply
Very true.

I have to steer models hard for C++. They constantly suggest std::variant :P

reply
is that bad?

Godbolt got a 2x speed improvement switching from what he thought was a good fast impl to std:variant

https://www.youtube.com/watch?v=gg4pLJNCV9I

reply
In higher dimensional vector space, yes it can.

Dimensionality gets bizarre in 1000-D space. Similarity and orthogonality express themselves in strange ways and each dimension codes different semantic meaning.

Therefore, if the training data is highly consistent you are by definition reducing some complexity and/or encoding better similarity.

In Go the statement

    result, err := Storage.write(...)

Is almost always going to be followed by

    if err != nil { ... }
In a highly dynamic language you may not get

   try { Storage.write() } catch (error) { ... }
Unless explicitly asked for.
reply
It's a little bit old, but challenge you opinions about what matters for LLM agentic coding:

https://github.com/Tencent-Hunyuan/AutoCodeBenchmark/blob/ma...

reply
> In a highly dynamic language you may not get

Being dynamic is secondary. A language that uses exceptions for errors does not always need to surround every try with a catch if the code doesn't need to. You have a top level handler that would catch everything.

reply
> LLMs are really good at translating to different programming languages.

...for which ample training data is available.

> This makes sense, given that they are derived from text translation systems.

...for languages with ample training data available.

Yes, LLMs can combine information in novel ways. They are wonderful in many respects. But they make far more mistakes if they can't lean on copious amounts of training data. Invent a toy language, write a spec, and ask them to use it. They will, but they will have a hard time.

reply
I have a language I wrote for processing data pipelines. I’ve used it for years, but I can count the number of users on one hand. I wrote it partially to learn about writing a scripting language, partially because Nextflow didn’t exist yet. I still use it now because it works much better for my way of processing data on HPC clusters.

The only code that exists on the internet for this is test data and a few docs in the github repo. It’s not wildly different from most scripting languages, from a syntax point of view, but it is definitely niche.

Both Codex and Claude figured it out real fast from an example script I was debugging. I was amazed at how well they picked up the minor differences between my script and others. This is basically on next to zero training data.

Would I ask it to produce anything super complex? Definitely not. But I’ve been impressed with how well it handles novel languages for small tasks.

reply
That might be an argument for not using a novel homebrew programming language. But it's not an argument against, like, any top-100 or even top-1000 programming language, which will be adequately represented in the training data.
reply
It is if more training data results in better performance. In which case, GP will continue to use the language that is likely to have the most training data available.
reply
> It is if more training data results in better performance.

Sure. But given the relation with translation systems, it seems far more likely that there are diminishing returns to larger volumes of training data.

reply
They are also good at generating plausible code. The kind that has no obvious bugs in it. I wouldn’t be surprised if humans in the loop over report success with these tools. Combined with decision fatigue… it’s not a good recipe for humans making good decisions.

An experienced Rust developer is going to be in a better position to drive an agent to generate useful Rust code than a Python programmer with little or no Rust experience. Not sure I agree with the author that everyone should just generate reams of Rust now.

At least if your get paged at 3am to fix the 300k AI-generated Django blog you’ll have a chance at figuring things out. Good luck to you if Claude is down at the same time. But still better than if it was in Rust if you have no experience with that language.

reply