Training on the CoT itself is pretty dubious since it's reward hacked to some degree (as evident from e.g. GLM-4.7 which tried pulling that with 3.0 Pro, and ended up repeating Model Armor injections without really understanding/following them). In any case they aren't trying to hide it particularly hard.