upvote
Agree, this feels like a distinction that needs formalising...

Passive transparency: training data, technical report that tells you what the model learned and why it behaves the way it does. Useful for auditing, AI safety, interoperability.

Active transparency: being able to actually reproduce and augment the model. For that you need the training stack, curriculum, loss weighting decisions, hyperparameter search logs, synthetic data pipeline, RLHF/RLAIF methodology, reward model architecture, what behaviours were targeted and how success was measured, unpublished evals, known failure modes. The list goes on!

reply
I'd also add training checkpoints to the list for active transparency. I think the Olmo models do a decent job, but it would be cool to see it for bigger models and for ones that are closer to state-of-the-art in terms of both architecture and algorithms.
reply