undefined

[-]

For context, an example of what happens when you feed the same image back in repeatedly: https://www.instagram.com/reels/DJFG6EDhIHs/

by sigmoid104 hours ago|

[-]

This is just the model converging on some kind of average found in its training data distribution. Here you can see the same concept starting from Dwayne Johnson and then converging to some kind of digital neo-expressionist doodle: https://www.reddit.com/r/ChatGPT/comments/1kbj71z/i_tried_th...

If there's a hint of sepia in the original image and the training data contains a lot of sepia images, it will certainly get reinforced in this process. And the original distracted boyfriend meme certainly has some strong sepia tones in the background. Same way that Dwayne Johnson's face looks a tad cartoonish. And in the intermediate steps they both flow towards some averaged human representation that seems pretty accurate if you consider the real world's ethnic distribution.

by vunderba9 hours ago|

[-]

Haha fantastic. I'd love to see a comparison reel of that same image-loop for the entire image gen series (gpt-image-1, gpt-image-1.5, gpt-image-2).

by dmichulke8 hours ago|

[-]

Fixed points are a window to the soul of a LLM

- Lucretius in "De rerum natura", probably

by Barbing7 hours ago|

[-]

Mirror: https://files.catbox.moe/mu8env.mp4

by omegabravo5 hours ago|

[-]

0 bytes?

by frilly_yak3 hours ago|

[-]

catbox has been doing that for videos recently, don't know why. try https://www.vxinstagram.com/reels/DJFG6EDhIHs/

by Suppafly8 hours ago|

[-]

I like how the AI seems forced to change their ethnicity to keep up with the color changes. Absolutely wild.

[-]

Enough internet for today

by jamiek886 hours ago|

[-]

That is so creepy in a sci fi other worlds type way.

by hansmayer7 hours ago|

[-]

For me, the worst part is how these ghouls manage to ruin everything with their bullshit technology. Once they touch something unique and make it "AI" it just gets ruined. Now whenever I see something resembling that style, I have to assume it's the bullshit AI. And that's just a minor nuisance - now every underdeveloped idiot uses it to "up their game" with consequences we are only going to understand completely in the upcoming years.

by ishtanbul9 hours ago|

[-]

Its called the piss filter

by NitpickLawyer9 hours ago|

[-]

All GPTisms are like that. In moderation there's nothing wrong with any of them. But you start noticing them because a lot of people use these things, and c/p the responses verbatim (or now use claws, I guess). So they stand out.

I don't think it's training data overrepresentation, at least not alone. RLHF and more broadly "alignment" is probably more impactful here. Likely combined with the fact that most people prompt them very briefly, so the models "default" to whatever it was most straight-forward to get a good score.

I've heard plenty of "the system still had some gremlins, but we decided to launch anyway", but not from tens of thousands of people at the same time. That's "the catch", IMO.

by pants28 hours ago|

[-]

Maybe the only solution to GPTisms is infinite context. If I'm talking to my coworker every day I would consciously recognize when I already used a metaphor recently and switch it up. However if my memory got reset every hour, I certainly might tell the same story or use the same metaphor over and over.

by telotortium7 hours ago|

[-]

> However if my memory got reset every hour, I certainly might tell the same story or use the same metaphor over and over.

All people repeat the same stories and phraseology to some extent, and some people are as bad or worse than LLM chat bots in their predictability. I wonder if the latter have weak long-term memory on the scale of months to years, even if they remember things well from decades ago.

[-]

Honestly I think there is more to it - even with infinite context, the LLM needs some kind of intelligence to know what is noise and what is not, you resort to "thinking" - making it create garbage it then feeds to itself.

Learning a language is a big complex task, but it is far from real intelligence.

by mike_hearn4 hours ago|

[-]

Another possibility is output watermarking. It's possible to watermark LLM generated text by subtly biasing the probability distribution away from the actual target distribution. Given enough text you can detect the watermark quite quickly, which is useful for excluding your own output from pre-training (unless you want it... plenty of deliberate synthetic data in SFT datasets now as this post-mortem makes clear).

I was told this was possible many years ago by a researcher at Google and have never really seen much discussion of it since. My guess is the labs do it but keep quiet about it to avoid people trying to erase the watermark.

[-]

I think the problem is that humans are not random, they are very biased. When you try to capture this bias with an LLM you get a biased pseudo random model

by krackers9 hours ago|

https://softwareengineering.stackexchange.com/questions/1325...

[-]

>with the word "seam" as it pertains to coding

I thought this was an established term when it comes to working with codebases comprised of multiple interacting parts.

by postalcoder9 hours ago|

[-]

thanks for this.

> the term originates from Michael Feathers Working Effectively with Legacy Code

I haven’t read the book but, taking the title and Amazon reviews at face value, I feel like this embodies Codex’s coding style as a whole. It treats all code like legacy code.

by TeMPOraL5 hours ago|

[-]

It's not in the top 10, but it's of the more well-known and widely recommended book in the software industry. I'd put it in the same bucket as "Clean Code" and maybe even "Domain Driven Design"; they're kinda from the same "thought school" in the software industry. So it's definitely over-represented in training data (I'd guess primarily in the form of articles and blog posts and educational material reiterating or rephrasing ideas from the book).

FWIW, I found the concept of "seams" from that book useful back when working on some legacy C++ monolithic code few years back, as TDD is a little more tricky than usual due to peculiarities of the language (and in particular its build model), and there it actually makes sense to know of different kind of "seams" and what they should vs. shouldn't be used for.

by eterm7 hours ago|

[-]

It's been a long time since I read it, but it was one of the better books I've read. It changed my approach to how to think about old code-bases.

by layer86 hours ago|

[-]

No, it’s not an established term outside the mentioned books, beyond the generic meaning of the word.

by krackers6 hours ago|

(https://blog.sasworkshops.com/unit-testing-and-seams/)

[-]

I have frequently encountered the term in the context of unit testing and dependency injection.

Other references (and all predate chatgpt):

>Seams are places in your code where you can plug in different functionality

>Art of Unit Testing, 2nd edition page 54

>With the help of a technique called creating a seam, or subclass and override we can make almost every piece of code testable.

https://www.hodler.co/2015/12/07/testing-java-legacy-code-wi...

> seam; a point in the code where I can write tests or make a change to enable testing

https://danlimerick.wordpress.com/2012/06/11/breaking-hidden...

Maybe it all ultimately traces back to the book mentioned before, but I don't believe it's an obscure term in the circles of java-y enterprise code/DI. In fact the only reason I know the term is because that's how dependency injection was first defined to me (every place you inject introduces a "seam" between the class being injected and the class you're injecting into, which allows for easy testing). I can't remember where exactly I encountered that definition though.

by tdeck8 hours ago|

[-]

I can't say it isn't, but I have been writing code since about 2004 and this is the first time I've become aware that this is a thing.

by tudorpavel9 hours ago|

[-]

The one phrase that irks me as overly dramatic and both GPT and Claude use it a lot is "__ is the real smoking gun!"

I'm a non-native English speaker, so maybe it's a really common idiom to use when debugging?

by aorloff8 hours ago|

[-]

It probably was found in a bunch of meaningful code commit messages

by gizajob5 hours ago|

[-]

I’m a British English speaker and find the use of cliched American idioms really quite disgusting. Don’t want to think about about ballparks, home runs, smoking guns, going all in, touchdowns or hitting it out the park.

by DharmaPolice2 hours ago|

[-]

Ironically (or not) I've seen smoking gun attributed to Arthur Conan Doyle in a Sherlock Holmes story. (It was smoking pistol in that story). Even if that's rubbish, I think that one is common across the English speaking world. The baseball/American football stuff is a bit different. In the commonwealth we might say "Hit for six" instead of hitting it out of the park. There are a bunch of other ones related to sports more common in England like snookered, own-goal, red card, etc.

by gizajob2 hours ago|

[-]

That observation about Sherlock Holmes certainly puts the smackdown on me and gets you to home plate.

by weitendorf4 hours ago|

[-]

It actually probably wouldn’t be too expensive or difficult to finetune those sayings out of default behavior if it were made accessible to you, you could even automate most of the relabeling by having the model come up with a list of idioms and appropriate replacement terms so it calls eg cookies biscuits or removes references to baseball. Absolute bollocks they don’t offer that as a simple option anymore

by gizajob2 hours ago|

[-]

Should send over a geezer to give them a slap.

by walthamstow3 hours ago|

[-]

In my user instructions I always have a point to "always use British English" which seems to reduce Americanisms. I am yet to see Claude give me a "back of the net!" though, sadly.

by dboreham2 hours ago|

[-]

Crikey, you are correct!

by socks7 hours ago|

[-]

My colleagues were joking about smoking guns yesterday after noticing that Claude was obsessed with it.

by thinkingemote4 hours ago|

[-]

I like how your co-workers enjoy the language. I had a similar group of colleagues once who did similar pre LLM but with words in popular culture, very playful.

In the future these tells will be more identifiable. We will be easier to point back at text and code written in 2026 and more confidently say "this was written by an LLM". It takes time for patterns to form and takes time for it to be noticeable. "Smoking gun was so early 2026 claude".I find thinking of the future looking at now to be refreshing perspective on our usage.

by jijijijij5 hours ago|

[-]

> I'm a non-native English speaker, so maybe it's a really common idiom to use when debugging?

No. But it is something goblins say a lot.

by rob744 hours ago|

[-]

Especially sleuth goblins...

by joegibbs2 hours ago|

[-]

ChatGPT has a whole host of weird words that it uses about coding - anything changed is a “pass” done over the code, it loves talking about “chrome” in the UI, it’s always saying “I’m going to do X, not [something stupid that nobody would ever think of doing]”

by bwat491 hours ago|

[-]

gpt also loves talking about handwaving, "I'm going to do X, not just a hand-wavy victory lap"

by ahmadyan7 hours ago|

[-]

i just want to know where emdash came from, as it is quite rare to see it on the public internet, so it must have been synthetically added to the dataset.

by doginasuit7 hours ago|

[-]

Emdash is very common in academic journals and professional writing. I remember my English professor in the early 2000s encouraging us to use it, it has a unique role in interrupting a sentence. Thoughtfully used, it conveys a little more editorial effort, since there is no dedicated key on the keyboard. It was disappointing to see it become associated with AI output.

by TeMPOraL5 hours ago|

[-]

Other than things other comments already mention, let's not forget that Microsoft Word auto-corrects "--" to em-dash, and so does (apparently - haven't checked myself) Outlook, Apple Pages, Notes and Mail. There's probably bunch of other such software (I vaguely recall Wordpress doing annoying auto-typography on me, some 15 years ago or so).

by gizajob5 hours ago|

[-]

Because on the public internet people don’t have arts degrees which are where emdash users learn to wield it correctly.

by dboreham2 hours ago|

[-]

I learned about em-dashes by reading Knuth about 40 years ago.

by LiamPowell7 hours ago|

[-]

The very simplified answer is that the models are first trained on everything and then are later trained more heavily on golden samples with perfect grammar, spelling, etc..

by 6 hours ago|

[-]

deleted

by honzaik5 hours ago|

[-]

although emdashes are not common on the internet, there are prevalent in books.

by bananaflag4 hours ago|

https://xcancel.com/Logo_Daedalus

[-]

Logo_Daedalus tended to use it a lot

by red_admiral4 hours ago|

[-]

`---` in TeX?

by jijijijij5 hours ago|

[-]

It has been rare. It's common now, even in meaningful human texts. (I know because I detest the correct usage without spaces, t looks wrong.) One of the ways AI is shaping our minds.

by vidarh8 hours ago|

[-]

Claude, at least 4.5, not checked recently, has/had an obsession with the number 47 (or numbers containing 47). Ask it to pick a random time or number, or write prose containing numbers, and the bias was crazy.

Also "something shifted" or "cracked".

by dhosek8 hours ago|

[-]

Humans tend to be biased towards 47 as well. It’s almost halfway between 1 and 100 and prime so you’ll find people picking it when they have to choose a random number.

Then there’s the whole Pomona College thing https://en.wikipedia.org/wiki/47_(number)

by vidarh6 hours ago|

[1] https://en.wikipedia.org/wiki/Blue%E2%80%93seven_phenomenon

[-]

The whole blue 7 thing [1] and variations is very fascinating, but we don't tend to repeatedly pick the same number in the same exact context, though. That's what made this stand out to me - I had a document where Claude had picked 47 for "random" things dozens of times.

I experienced this even second hand when a coworker excitedly told of an encounter with a cold reader, and I knew the answer would be blue 7 before he told me what his guess was. Just his recap of the conversation was enough.

by flawn6 hours ago|

[-]

I am biased towards 67

by eloisant4 hours ago|

[-]

Funny, I didn't know there were 10 years old on hacker news!

by wmf8 hours ago|

[-]

Maybe Claude is just a fan of Alias.

by afro882 hours ago|

[-]

> The obsession with the word "seam" as it pertains to coding

I quite liked this term when it started using it. And I appreciate the consistent way it talks about coding work even when working on radically different stacks and codebases

by creamyhorror1 hours ago|

[-]

"Seam" has been stretched by AI from its original legacy-code context to any point in code where something can be plugged in. I actually asked an AI about this a few weeks ago because I was surprised by the consistent, frequent use of "seam".

Frequent words I see from GPT: "shape", "seam", "lane", "gate" (especially as verb), "clean", "honest", "land", "wire", "handoff", "surface" (noun), "(un)bounded" (and sometimes "unlock")

It feels like AI really likes to pick the shortest ways to express ideas even if they aren't the most common, which I suppose would make sense if that's actually what's happening.

by isege4 hours ago|

[-]

One I noticed with gemini, especially 3 flash: "this is the classic _____".

by eterm7 hours ago|

[-]

"is the real" is such a strong Claude tell, whenever I encounter it, it makes me question what i'm reading.

Another I've noticed more recently is a slight obsession over refering to "Framing".

[-]

You're absolutely right. I was wrong in the first place

by Skidaddle7 hours ago|

[-]

I miss being told “You’re absolutely right!” :’(

by jofzar9 hours ago|

[-]

One I saw recently was "wires" and "wired" from opus.

It was using it like every 3rd sentence and I was like, yeah I have seen people say wired like this but not really for how it was using it in every sentence.

by baq9 hours ago|

[-]

GPT started to ‘wire in’ stuff around 5.2 or 5.3 and clearly Opus, ahem, picked it up. I remember being a tiny bit shocked when I saw ‘wired’ for the first time in an Anthropic model.

by Barbing7 hours ago|

[-]

Anthropic distills GPT?

by yorwba6 hours ago|

[-]

Everybody training models on large amounts of lightly filtered internet text is partially distilling every other model that had its output posted verbatim to the internet.

by beAbU4 hours ago|

[-]

And OpenAI probably distills anthropic, who would't?

It's all one big incestuous mess. In a couple of years we'll be talking about AI brainrot.

by Helmut100014 hours ago|

[-]

I had the feeling they didn't really answer the questions, that is why the goblins appeared. They simply "retired the “Nerdy” personality" because they couldn't fix it and went on.

by pdntspa8 hours ago|

[-]

The number of things that Claude has told me are 'load-bearing' or 'belt-and-suspenders' is... very load-bearing

by sushid7 hours ago|

[-]

You are absolutely right to call that out!

by DespairYeMighty8 hours ago|

[-]

for me, doing the heavy lifting is doing the heavy lifting

[-]

Fun fact: the word suffer comes from sub fer - under load, this relation (suffer - load bearing) is consistent across (unrelated) languages

by andromaton8 hours ago|

[-]

Also too many lands and hits.

by operatingthetan9 hours ago|

[-]

Seams, spirals, codexes, recursion, glyphs, resonance, the list goes on and on.

by andai9 hours ago|

[-]

Ask any LLM for 10 random words and most of them will give you the same weird words every time.

by Terr_9 hours ago|

[-]

If you lower the temperature setting, it really will be the same 10 words every single attempt. :p

by gloflo8 hours ago|

[-]

They are text completion algorithms with little randomness.

by wodenokoto5 hours ago|

[-]

I thought the “why it matters” headline was a funny reference to ChatGPT phraseology

by alex_sf8 hours ago|

[-]

"shape" too, at least with gpt5.5, is coming up constantly.

by teaearlgraycold5 hours ago|

[-]

Whenever Claude finishes some work it almost always says “Clean.” before finishing its closing remarks. It’s at the point where I repeat it out loud along with Claude to highlight the absurdity of the repetition.

by weitendorf4 hours ago|

[-]

With 4.5, I think because I would prompt it/guide it towards an outcome by calling it “the dream: <code example>” it would get almost reverential / shocked with awe as it got closer to getting it working or when it finally passed for the first time. Which was funny and reasonably context appropriate but sometimes felt so over the top that I couldn’t tell if it also “liked” the project/idea or if I had somehow accidentally manipulated it into assigning religious purpose to the task of unix-style streaming rpcs.

I think a lot of the “clean” stuff stems from system prompts telling it to behave in a certain way or giving it requirements that it later responds to conversationally.

Total aside: I actually really dislike that these products keep messing around with the system prompts so much, they clearly don’t even have a good way to tell how much it’s going to change or bias the results away from other things than whatever they’re explicitly trying to correct, and like why is the AI company vibe-prompting the behavior out when they can train it and actually run it against evals.

by croisillon6 hours ago|

[-]

and "quietly"!

by dyauspitr6 hours ago|