upvote
RL environment (instruction, stateful container, reward function) is the training data product being bought
reply