I wonder if extracting those static reasoning chains make sense given a Rich Sutton's "The Bitter Lesson" and Geoffrey Hinton's "People should stop training radiologists now.". I guess until participants make money they won't stop, not sure if they do, so far it is more about expectation of profitability as I understand.
Given exposure to enough reasoning chains, with training data that is designed around adversarial reasoning and teaching models to reason, these types of training data might be key to teaching models to reason beyond what they could gather from static data.
The boundary is pretty thin there though. E.g., Gemini recently told me that a certain papers claims that two frameworks are mathematically equivalent, while the paper shows the opposite, and yesterday Google's AI overview told me that no World Cup matches were scheduled for that day despite their being several of them. The model probably used complex reasoning to arrive at both (incorrect) answers, but superficially they look like basic errors of fact.
You write the prompt, and then write rubrics to judge the responses, and you found something the model failed at. Congratulations, you just earned $500, now do it again.
But be careful: they are watching you and they don't want you giving away their secrets!