upvote
> And how much?

Mercor, one of the larger vendors for contracting with experts to create bespoke data, says on their webpage they're paying $3M/day to their contractors for data.

So well into the billions of dollars a year for bespoke training data.

That's also ignoring the RLVR data labs can get from software - they can use the vibe coding sessions as training data as well without paying more.

They are just one of many.

reply
Companies like Mercor sell data from human experts
reply
Offhand, do you know what format that data is in? Is it a question and then a human answering that question? Mostly just curious at to what the training data consists of.
reply
The most advanced training data is in the form of rubrics as rewards.

A human asks a question, then writes rubrics to judge the LLMs response, so rather than evaluating a specific response, those rubrics can live on as the LLM evolves and gives different answers. There are more complex variants as well, but that's the basic principle.

https://arxiv.org/abs/2507.17746

reply
meta has reallocated a significant protion of their staff to genrating this
reply
Meta also reportedly took a 49% nonvoting stake in Scale AI in June 2025 for about $14.3–$14.8 billion.
reply