Hacker News
new
past
comments
ask
show
jobs
points
by
charleshn
1 hours ago
|
comments
by
jdub
4 minutes ago
|
[-]
Reinforcement learning for "reasoning" perturbs the model to generate completions in a particular chain of thought / alternative selection structure. It's three next token predictors in a trench coat.
reply