For Gemini it seems to always pick 2.5 despite 3.1 being the latest, Claude the 3.5-era models.
Not sure what’s preventing AI labs on ensuring this stuff is refreshed during training.
[0] https://developers.googleblog.com/closing-the-knowledge-gap-...