upvote
You talk as if problem solving is a supervised (imitation) learning problem. No, it is a reinforcement learning problem, models learn by solving problems and getting rated. They generate their own training data. Optimal budget allocation is 1/3 cost pre-training, 1/3 for RL, and 1/3 on inference.
reply
> The difference is that when a human reasoner goes to solve a problem, they'll think "this kind of proof usually goes this way" - following an explicit rule enforcement.

How is this different from "probabilistic pattern selection"?

reply
Because... it's just different, that's all! OK?
reply
I don’t think there’s any evidence that “human reasoning” isn’t also based on probabilistic pattern selection.
reply
It’s amazing simple things have to be reiterated.

Perhaps it’s best if most admit they don’t have the fundamental ways of thinking to even participate in the conservation.

When all nuance is lost, the discussion must end.

reply
You should leave this site. Comments like this are not good for this site. You should go somewhere else.
reply