https://github.com/jyn514/saltwater
> https://github.com/jyn514/saltwater
This is just a frontend. It uses Cranelift as the backend. It's missing some fairly basic language features like bitfields and variadic functions. And if I'm reading the documentation right, it requires all the source code to be in a single file...
> https://github.com/ClementTsang/rustcc
This will compile basically no real-world code. The only supported data type is "int".
> https://github.com/maekawatoshiki/rucc
This is just a frontend. It uses LLVM as the backend.
I completely agree that "reweite this existing codebase into a new language" could be a very powerful tool. But the article is making much bolder claims. And the result was more limited in capability, so you can't even really claim they've achieved the rewrite skill yet.
Without enough examples to copy from (despite CPU manuals being available in the training set) the approach failed. I wonder how well it'll do when you throw it a new/imaginary instruction set/CPU architecture; I bet it'll fail in similar ways.
And it's a bit of a nasty optimization problem, because the result is all or nothing. Implementing enough optimizations to get from 60kB to 33kB is useless, all the rewards come from getting to 32kB.
If the model were retrained without any of the existing compilers/toolchains in its training set, and it could still do something like this, that would be very compelling to me.