points
It is purely a test of capabilities (can it do a thing that takes a human $X hours), not efficiency (how fast will it do it).
At least I want AI to solve my problems, not score high on a academic leaderboard.