undefined

points

by Artgor5 hours ago |

comments

by solenoid09375 hours ago|

[-]

My Meta friends say it's benchmaxxed af

by loeg5 hours ago|

parent|

[-]

We used to call this "overfitting," but I suppose everything has to be maxxed now. Fitmaxxed?

by conradkay5 hours ago|

prev|

[-]

It doesn't seem benchmaxxed, ARC AGI 2 score is quite bad (42.5%, GPT 5.4 is 76.1%) and coding is okay. But maybe this is the best Meta can do even benchmaxxing

The impressive part is multimodality, very plausible since there's less focus there by other labs (especially Anthropic)

by dbgrman1 hours ago|

prev|

[-]

Given llama 4 mucked up benchmark numbers, I’d take spark announcement with a many grains of salt.