undefined

points

by throwup2388 hours ago |

comments

by algorithm3147 hours ago|

[-]

Can't they just run the output through a compiler to get feedback? Syntax errors seem easier to get right.

by NitpickLawyer7 hours ago|

parent|

[-]

The difference is in scaling. The top US labs have oom more compute available than chinese labs. The difference in general tasks is obvious once you use them. It used to be said that open models are ~6mo behind SotA a year go, but with the new RL paradigm, I'd say the gap is growing. With less compute they have to focus on narrow tasks, resort to poor man's distillation and that leads to models that show benchmaxxing behavior.

That being said, this model is MIT licensed, so it's a net benefit regardless of being benchmaxxed or not.

by rockinghigh7 hours ago|

parent|

prev|

[-]

They do. Pretty much all agentic models call linting, compiling and testing tools as part of their flow.

by ej887 hours ago|

prev|

[-]

the new meta is purchasing rl environments where models can be self-corrected (e.g. a compiler will error) after sft + rlhf ran into diminishing returns. although theres still lots of demand for "real world" data for actually economically valuable tasks