Given exposure to enough reasoning chains, with training data that is designed around adversarial reasoning and teaching models to reason, these types of training data might be key to teaching models to reason beyond what they could gather from static data.
I was under impression that every time LLMs try to be truly novel and they need to assume things in the area where they didn't have enough data points that there were trained on, results are not good, has that changed?