upvote
It’s often very difficult (intractable) to come up with a probability distribution of an estimator, even when the probability distribution of the data is known.

Basically, you’d need a lot more computing power to come up with a distribution of the output of an LLM than to come up with a single answer.

reply
What happens before the probability distribution? I’m assuming say alignment or other factors would influence it?
reply
In microgpt, there's no alignment. It's all pretraining (learning to predict the next token). But for production systems, models go through post-training, often with some sort of reinforcement learning which modifies the model so that it produces a different probability distribution over output tokens.

But the model "shape" and computation graph itself doesn't change as a result of post-training. All that changes is the weights in the matrices.

reply