Hacker News
new
past
comments
ask
show
jobs
points
by
greesil
1 hours ago
|
comments
by
esafak
3 minutes ago
|
next
[-]
https://en.wikipedia.org/wiki/Reinforcement_learning#Policy
reply
by
antonvs
27 minutes ago
|
prev
|
[-]
> one could just call it model output.
That would be incorrect. My other reply attempts to address this.
reply
by
greesil
18 minutes ago
|
parent
|
[-]
But the probability vector is the output of the LLM, no?
reply