You realize in regards to only using and not training LLMs you are in the triple 9 majority right. Even if we only considered so called coders
NVIDIA Nemotron-Personas-USA — 1 million synthetic Americans whose demographics match real US census distributions
https://huggingface.co/datasets/nvidia/Nemotron-Personas-USA