upvote
I don't see this as just exercise in making a new useful thing, but benchmarking the SOTA models ability to create a massive* project on its own, with some verifiable metrics of success. I believe they were able to build FFMPEG with this rust compiler?

How much would it cost to pay someone to make a C compiler in rust? A lot more than $20k

* massive meaning "total context needed" >> model context window

reply
This is a nice benchmark IMO. I would be curious to see how competitors and improved models would compare.
reply
And how long will it take before an open model recreates this. The "vibe" consensus before "thinking" models really took off was that open was ~6mo behind SotA. With the massive RL improvements, over the past 6 months I've thought the gap was actually increasing. This will be a nice little verifiable test going forward.
reply