upvote
Also passing tests doesn't mean something works.

Claude code C compiler passed 100% of gcc tests and couldn't even run a hello world...

reply
It couldn't run "hello, world" on systems where the include files were not located in the directory that it expected -- producing instead diagnostics saying, quite clearly, that the header files were not found. On systems where they were, it built versions of postgresql, redis, and several other things which passed their test suites completely.

If you've heard this problem described as a fundamental limitation of the compiler, and not the kind of packaging glitch that's routine to find in pre-alpha software of all descriptions, whoever described it to you that way is not serving their readers well.

I'm not saying CCC was production-ready, or close -- the total lack of an optimizer would be a killer in any real use, and I assume that there were problems with the diagnostics at least as bad as problems with performance and the include files, for similar reasons -- the LLMs hadn't been asked to optimize for that stuff yet, just test suite correctness. But it did achieve that, and the amount of cope I've seen on social media claiming otherwise is more than a bit disturbing.

reply
I have a colleague who multiple times committed code that doesn't work, like at all. Why? His code is only used in tests but not in the actual application. And apparently he never even bothered to click through things even once, let alone reviewing the code.

If it doesn't work, it doesn't. You can find all these excuses. But at the end of the day, there is a difference between an end user being able to get something out of your code or not.

reply
The C compiler written by Claude a few months was able to compile a hello world.

The main problem I think that it was extremely slow.

reply
i think theres a different lesson to be taken from those cases - the LLM will build to what you give a feedback loop for.

if you give just the logical tests, it wont consider the speed at all. if you included tests that measure the speed and ask the llm to match the performance, itll do that too.

its the same class of error as everything else with llms. it has no common sense context for things people consider important. if you dont enforce the boundaries, it will ignore them

reply
Question is, are our optimization functions well specified enough? (No)

How important is well specified opt function? No one knows. We will find out

reply
Discussed here if anyone's interested:

LLMs work best when the user defines their acceptance criteria first - https://news.ycombinator.com/item?id=47283337 - March 2026 (422 comments)

reply