undefined

points

[-]

Basically just madlibs - the models generate intermediate tokens that help predict a better answer based on training (RLHF & otherwise). They tend to look like "reasoning" because those tokens correlated with accepted answers during training.

Extended thinking passes are just more of the same. The entire methodology exists merely to provide additional context for the autoregression process. There is no traditional computation occurring