Based on Karpathy’s writeup the auto research would not have found this. He tells the agent to improve the model and training loop with a five minute time limit, but honestly this “hack” is so far out of distribution that it seems really unlikely an agent would find this.
reply