undefined

points

by stri8ted6 hours ago |

comments

by bonoboTP4 hours ago|

[-]

It's just an in-joke, he doesn't intend it as a serious benchmark anymore. I think it's funny.

by Legend24406 hours ago|

prev|

[-]

Y'all are way too skeptical, no matter what cool thing AI does you'll make up an excuse for how they must somehow be cheating.

by toraway5 hours ago|

parent|

[-]

Jeff Dean literally featured it in a tweet announcing the model. Personally it feels absurd to believe they've put absolutely no thought into optimizing this type of SVG output given the disproportionate amount of attention devoted to a specific test for 1 yr+.

I wouldn't really even call it "cheating" since it has improved models' ability to generate artistic SVG imagery more broadly but the days of this being an effective way to evaluate a model's "interdisciplinary" visual reasoning abilities have long since passed, IMO.

It's become yet another example in the ever growing list of benchmaxxed targets whose original purpose was defeated by teaching to the test.

https://x.com/jeffdean/status/2024525132266688757?s=46&t=ZjF...

by arcatech5 hours ago|

parent|

prev|

[-]

Or maybe you’re too trusting of companies who have already proven to not be trustworthy?

by pixl976 hours ago|

prev|

[-]

I mean if you want to make your own benchmark, simply don't make it public and don't do it often. If your salamander on skis or whatever gets better with time it likely has nothing to do with being benchmaxxed.