undefined

points

[-]

> So maybe the AI labs have been paying attention after all!

> I think this mainly demonstrates that the pelican on the bicycle has firmly exceeded its limits as a useful benchmark.

As acknowledged in the article.

by kzrdude1 hours ago|

parent|

[-]

Gemini 3.1 basically takes it home on that benchmark, anyway, it's done.

by nickvec1 hours ago|

prev|

[-]

Simon mentions further along in his article that given Jeff Dean’s post referencing the pelican-riding-a-bike task (and how good current models are at doing it), that it’s no longer a great benchmark to use. Enter the opossum riding an e-scooter!

by simonw1 hours ago|

prev|

[-]

That bit probably works better in the talk, it was a setup for a joke later on.