They are training them on decompilation and reverse engineering/blackbox reimplementations/pentesting because it’s one of the best ways to generate interesting and rare RL traces for agentic coding AND teach them how lots of things work under the hood.
Just throw Claude at millions of binaries and you can get amazing training data. Oh wait 4.7 gives you refusals for that now