upvote
Thinking vs non-thinking. There'll be a token cost there. But still fairly remarkable!
reply
Is there a reason we can't use thinking completions to train non-thinking? i.e. gradient descent towards what thinking would have answered?
reply
From what I've read, that's already part of their training. They are scored based on each step of their reasoning and not just their solution. I don't know if it's still the case, but for the early reasoning models, the "reasoning" output was more of a GUI feature to entertain the user than an actual explanation of the steps being followed.
reply