https://arxiv.org/abs/2112.00114 https://arxiv.org/abs/2406.06467 https://arxiv.org/abs/2404.15758 https://arxiv.org/abs/2512.12777
First that scratchpads matter, then why they matter, then that they don’t even need to be meaningful tokens, then a conceptual framework for the whole thing.
Did you test that ""caveman mode"" has similar performance to the ""normal"" model?
A lot of communication is just mentioning the concepts.
Funny idea though. And I’d like to see a more matter-of-fact output from Claude.
Take it a step further and do kind of like that xkcd where you try to post and it rewrites it like this and if you want the original version you have to write a justification that gets posted too.
Chef's kiss
Compare with fluid dynamics; it's not hard to write down the Navier–Stokes equations, but there's a million dollars available to the first person who can prove or give a counter-example of the following statement:
In three space dimensions and time, given an initial velocity field, there exists a vector velocity and a scalar pressure field, which are both smooth and globally defined, that solve the Navier–Stokes equations.
- https://en.wikipedia.org/wiki/Navier–Stokes_existence_and_sm...Seems reasonable, but this doesn't settle probably-empirical questions like: (a) to what degree is 'more' better?; (b) how important are filler words? (c) how important are words that signal connection, causality, influence, reasoning?
So it's probably true that the "Great question!---" type preambles are not helpful, but that there's definitely a lower bound on exactly how primitive of a caveman language we're pushing toward.