upvote
This isn’t accurate - most of the style comes from the fine tuning and reinforcement learning, not from the original training data.

At some point people got this idea that LLMs just repeat or imitate their training data, and that’s completely false for today’s models.

reply
>This isn’t accurate - most of the style comes from the fine tuning and reinforcement learning, not from the original training data.

Fine tuning, reinforcement, etc are all 'training' in my books. Perhaps this is your confusion over 'people got this idea'

reply
> Fine tuning, reinforcement, etc are all 'training' in my books.

They are but they have nothing to do with how frequent anything is in literature which was your main point.

reply
Agreed. The pre-2025 base models don't write like this.
reply
So LLMs have gotten creativity recently?
reply
No, my point has nothing to do with creativity. It's about the fact that their output is taylored to look and sound in a certain way in the later stages of model training, it's not representative of the original text data the base model was trained on.
reply
I used to use the em dash a lot. I refrain from doing it now. I hate that outcome.
reply
I refuse to let genAI determine what my writing style should be on principle. I may not be able to do much about the rest of the various degradations genAI brings, but I can at least stand my ground when it comes to my personal expression.
reply
>I used to use the em dash a lot. I refrain from doing it now. I hate that outcome.

I dont think it ever got taught in my schooling; the semi-colon is what they taught to use.

reply