undefined

points

[-]

> And how much?

Mercor, one of the larger vendors for contracting with experts to create bespoke data, says on their webpage they're paying $3M/day to their contractors for data.

So well into the billions of dollars a year for bespoke training data.

That's also ignoring the RLVR data labs can get from software - they can use the vibe coding sessions as training data as well without paying more.

They are just one of many.

by blovescoffee5 hours ago|

prev|

[-]

Companies like Mercor sell data from human experts

by trothamel4 hours ago|

parent|

[-]

Offhand, do you know what format that data is in? Is it a question and then a human answering that question? Mostly just curious at to what the training data consists of.

by jmalicki4 hours ago|

parent|

[-]

The most advanced training data is in the form of rubrics as rewards.

A human asks a question, then writes rubrics to judge the LLMs response, so rather than evaluating a specific response, those rubrics can live on as the LLM evolves and gives different answers. There are more complex variants as well, but that's the basic principle.

https://arxiv.org/abs/2507.17746

by dominotw4 hours ago|

prev|

[-]

meta has reallocated a significant protion of their staff to genrating this

by sroussey2 hours ago|

parent|

[-]

Meta also reportedly took a 49% nonvoting stake in Scale AI in June 2025 for about $14.3–$14.8 billion.