upvote
Compared to what comes out at the end. Like if you sit there watching Kimi k2.6 "think", you're like "what? no you fucking idiot!" and you get this urge to "steer" it and so on, but very rarely is that steering actually necessary, it just winds up popping out the correct answer and all of those 'Wait! That's it! I found it! Actually ... Let me just' is just whatever internal processing it needed to use to get to the correct response. Mostly likely it's just being self-adversarial and exploring a bunch of dumb avenues to isolate the best outcome with the highest probability
reply