upvote
> So maybe the AI labs have been paying attention after all!

> I think this mainly demonstrates that the pelican on the bicycle has firmly exceeded its limits as a useful benchmark.

As acknowledged in the article.

reply
Gemini 3.1 basically takes it home on that benchmark, anyway, it's done.
reply
Simon mentions further along in his article that given Jeff Dean’s post referencing the pelican-riding-a-bike task (and how good current models are at doing it), that it’s no longer a great benchmark to use. Enter the opossum riding an e-scooter!
reply
That bit probably works better in the talk, it was a setup for a joke later on.
reply