and
> "Meanwhile you have multiple Fields Medalists (Tau, Gowers) saying they’re very impressed by LLMs’ mathematical reasoning, something that the stochastic parrots thesis (if it has any empirically-predictive content at all) would predict was impossible. I doubt Tau and Gowers thought much of LLMs a few years ago either. But they changed their minds. Who do you want to listen to?"
I don't understand how these things are supposedly incompatible.
Larger models and further other refinement reduce the "haphazardness" of produced text. A big enough model with enough semantic connections between different words/phrasings/etc plus enough logical connections of how cause and effect, question and answer, works in human language can obviously stitch together novel sequences when presented with novel prompts. (The output was not limited to sequences of n words that appeared 1:1 in the training data for any n for at least three and a half years now, if not even back to when the paper was written.)
"without any reference to meaning" veers into the philosophical (see how much "intent" is brought up in the linked post today). But has anything been proven wrong about the idea that the text prediction is based on probabilistic evaluation based on a model's training data? E.g. how can you prove "reasoning" vs "stochastic simulated reasoning" here?
Perhaps a useful counterfactual (but hopelessly-expensive/possibly-infeasible) would be to see if you could program a completely irrational LLM. Would such a model be able to "reason" it's way into realizing its entire training model was based on fallacies and intentionally-misleading statements and connections, or would it produce consistent-with-its-training-but-logically-wrong rebuttals to attempts to "teach" it the truth?
> without any reference to meaning
is vague, but I read it as actually quite a strong claim about the limitations of LLMs. I don’t think it would be possible for LLMs to do long chains of correct mathematical reasoning about novel problems that they haven’t seen before “without any reference to meaning.” That simply isn’t possible just by regurgitating and remixing random chunks of training data. Therefore I consider the stochastic parrots picture of LLMs to be wrong.
It might have been an accurate picture in 2020. It is not an accurate picture now. What is often missed in these discussions is that LLM training now looks totally different than it did a couple years ago. RLVR completely changed the game, allowing LLMs to actually do math and code well, among other things.