Does anyone know how the 158000x slowdown happened? That's quite ridiculous.
I never claimed they could! I just view this as a successful experiment. I don't think anthropic was making that claim with their experiment either.
It feels reflexive to the moment to argue against that claim, but I tend to operate with a bit more nuance than "all good" or "all bad".
The overall impression given was inaccurate and the implicit claim of a fully working end-to-end generated compiler was inaccurate. The headlines were incomplete in a way that was intentionally misleading. It was an interesting experiment and somewhat impressive but the claims were overblown. It happens.
You can call that a success (as it did something impresssive even though it failed to produce a workable C compiler) but my point in bringing this up was to show that today's models are not yet able to produce production software without close supervision, even when uncharacteristically good specs and hand-written tests exist.
Edit: Maybe uncharitably is too strong, but we're talking past each other.
> It's 2026 and the idea that even with detailed-enough requirements you can one-shot even a workable (let alone perfect) solution also needs to die.
and brought up the failed anthropic experiment as proof of that. Yes, you are talking past each other, but that is not pron's fault. It is your fault.
I don't think they tried to do that though.
> today's models are not yet able to produce production software without close supervision, even when uncharacteristically good specs and hand-written tests exist.
That's a good point anyway
Their compiler fails to compile (well, at least link) some C programs altogether, and in other cases it produces code that is 150,000x slower than a real C compiler with optimisations turned off (interestingly, the model trained on the real compiler's source code). That's not "not competitive" but "cannot be used in the real world". But even more importantly, the compiler cannot be fixed or evolved. It's bricked (at least as far as today's models' capabilities go). For any kind of software, not being able to improve or fix anything or add any new feature means it's effectively dead.
You could not use it in production even if no other C compiler existed.
- John Carmack embedded a C compiler and interpreter/runtime into Quake back in the mid 1990s as a scripting language! It was that efficient that it could be used in a real time 3D shooter. That's a solo effort as a minor component of a much larger piece of software.
- I've seen university CS courses hand out "implement a C compiler" as a homework / project exercise for students. It's not particularly difficult.
Sure, a modern C compiler like GCC has to handle inline assembly, various extensions, pragmas, intrinsics, etc... but like you said, all of those are thoroughly documented and have open source implementations to reference.
Similarly, the Rust compiler is implemented in Rust and could be used as an idiomatic reference for a generic compiler framework with input handling, parsing, intermediate representations, and so forth.
I would bet that those things are also true of at least one expensive commercial C compiler.