upvote
Cursor is doing that (i think with Fireworks as their provider)

https://cursor.com/blog/real-time-rl-for-composer

reply
I'm interested in trying something similar. I was thinking to do this for my OpenClaw agent.

About Owain Evans work: I think he did SFT. On Twitter someone was saying that RL is not as susceptible to what he showed. I'd like to try that

reply