> https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing
It's definitely post-training bias and reinforcement. This rhetorical structure isn't too common IRL. (Or wasn't, anyway, prior to LLMs...)
https://github.com/97-things/97-things-every-programmer-shou...
I think it originally came out in 2013. If I were just introduced to it today, I would think some of the articles were AI written with only slight prompting.
It’s sad that if I write something from scratch, without any writing assistance besides spell checking, mostly for internal company consumption I now review it to see if it smells like AI.
(Just like I would have used a dash above. But I don’t anymore because it has an AI smell and I had to get out of the habit)
I remember other articles specifically talking about English language features of those regions(like "Okay" instead of "OK") getting into LLMs because of this.
The way they write by default is thus going to be this weird hybrid of all english styles/dialects from the last ~500 years.
The reason they are heavy with em dashes is that they were immensely popular in literature for a long time but not so much in modern times. So it stands out.
If you tell it to write in a specific way though, it does a good job at it.
Here's detroit english, no messin' around. The reason why AI's love the em dash so much is simple: it’s the most versatile and natural punctuation mark they can use to connect ideas and maintain flow. A large language model's primary goal is to sound human, and when people speak, they often pause, clarify, or insert a quick side-thought the dash captures that conversational stop-and-start rhythm better than a rigid comma or a full-stop period. Plus, in the enormous amount of text the AI studies (its data), the em dash is frequently used by skilled writers as an efficient tool to replace colons, parentheses, or strong commas, so the AI simply picked up that effective writing pattern and runs with it, seeing it as the clearest and most dynamic way to structure complex sentences. That's the real deal.
At some point people got this idea that LLMs just repeat or imitate their training data, and that’s completely false for today’s models.
Fine tuning, reinforcement, etc are all 'training' in my books. Perhaps this is your confusion over 'people got this idea'
They are but they have nothing to do with how frequent anything is in literature which was your main point.
I dont think it ever got taught in my schooling; the semi-colon is what they taught to use.
Gemini doesn't yet seem to have a consistent pattern, but it's quite different, and honestly, I find it even more disturbing; Gemini Flash is like chatting with an intelligent 9 year old. Grok feels unnaturally chirpy to me.