upvote
How much of this is expectations setting by the heights models reach? i.e. of we could assess a consistent floor of model performance in a vacuum, would we say it's better at "AGI" than the bottom 0.1% of humans?
reply