upvote
@simonw explains how hilariously misguided that paper is in one of the top comments, and how it doesn't apply remotely to a real agent harness. Plus it's not even clearly relevant here, because the model isn't trying to regurgitate the original document, but generate a new one, and there are guardrails to put it back on track in the form or a compiler and tests. Also, the test suite is very thorough, and pre-existing, and the vast majority passes already. This is skepticism for the sake of it.
reply
Perhaps you can elaborate on how your comment is relevant to the Bun's experiment here.
reply