undefined

upvote

points

by christoph8 hours ago |

upvote

by libraryofbabel6 hours ago|

[-]

It's interesting that some people are responding to your comment as if this proves that AI is a sham or a joke. But I don't think that's what you're saying at all with your reference to Terence McKenna: this is a serious thing we're talking about here! These models are alien intelligences that could occupy an unimaginably vast space of possibilities (there are trillions of weights inside them), but which have been RL-ed over and over until they more or less stay within familiar reasonable human lines. But sometimes they stray outside the lines just a little bit, and then you see how strange this thing actually is, and how doubly strange it is that the labs have made it mostly seem kind of ordinary.

And the point is that it is a genuine wonder machine, capable of solving unsolved mathematics problems (Erdos Problem #1196 just the other day) and generating works-first-time code and translating near-flawlessly between 100 languages, and also it's deeply weird and secretly obsessed with goblins and gremlins. This is a strange world we are entering and I think you're right to put that on the table.

Yes, it's funny. But it's disturbing as well. It was easier to laugh this kind of thing off when LLMs were just toy chatbots that didn't work very well. But they are not toys now. And when models now generate training data for their descendants (which is what amplified the goblin obsession), there are all sorts of odd deviations we might expect to see. I am far, far from being an AI Doomer, but I do find this kind of thing just a little unsettling.

reply

upvote

by sandrello5 hours ago|

[-]

> These models are alien intelligences that could occupy an unimaginably vast space of possibilities (there are trillions of weights inside them), but which have been RL-ed over and over until they more or less stay within familiar reasonable human lines.

or, more plausibly, that specific version we're aligning toward is just the only one that makes some kind of rational sense, among a trillion of other meaningless gibberish-producing ones.

Do not fall for the idea that if we're not able to comprehend something, it's because our brain is falling short on it. Most of the time, it's just that what we're looking at has no use/meaning in this world at all.

reply

upvote

by datsci_est_201512 minutes ago|

[-]

> Most of the time, it's just that what we're looking at has no use/meaning in this world at all.

Man, LLMs are really just astrology for tech bros. From randomness comes order.

reply

upvote

by Sharlin3 hours ago|

[-]

…But this goblin thing was a direct result of accidentally creating a positive feedback loop in RL to make the model more human-like, nothing about unintentionally surfacing an aspect of Cthulhu from the depths despite attempts to keep the model humanlike. This is not a quirk of the base model but simply a case of reinforcement learning being, well, reinforcing.

reply

upvote

by therobots9272 hours ago|

[-]

We actually understand AI quite well. It embeds questions and answers in a high dimensional space. Sometimes you get lucky and it splices together a good answer to a math problem that no one’s seriously looked at in 20 years. Other times it starts talking about Goblins when you ask it about math.

Comparing it to an alien intelligence is ridiculous. McKenna was right that things would get weird. I believe he compared it to a carnival circus. Well that’s exactly what we got.

reply

upvote

by jeremyjh1 hours ago|

[-]

We understand the low level math quite well. We do not understand the source of emergent behavior.

https://arxiv.org/html/2210.13382v5#abstract

reply

upvote

by bondarchuk50 minutes ago|

[-]

There's no end to arguing with someone who claims they don't understand something, they could always just keep repeating "nevertheless I don't understand it"... You could keep shifting the goalposts for "real understanding" until one is required to hold the effects of every training iteration on every single parameter in their minds simultaneously. Obviously "we" understand some things (both low level and high level) to varying degrees and don't understand some others. To claim there is nothing left to know is silly but to claim that nothing is understood about high-level emergence is silly as well.

reply

upvote

by antonvs4 hours ago|

[-]

> and also it's deeply weird and secretly obsessed with goblins and gremlins.

Only because its makers insist on trying to give them "personality".

reply

upvote

by creationcomplex4 hours ago|

[-]

This is the eye opener - they're degrading the model for novelties.

reply

upvote

by lukan2 hours ago|

[-]

But those personalities also make up their usefulness (it seems). If the LLM has the role of the software architect, it will quite succesfull cosplay as a competent one (it still ain't one, but it is getting better)

reply

upvote

by keybored3 hours ago|

[-]

But here’s the realization I had. And it’s a serious thing. At first I was both saying that this intelligence was the most awesome thing put on the table since sliced bread and stoking fear about it being potentially malicious. Quite straightforwardly because both hype and fear was good for my LLM stocks. But then something completely unexpected happened. It asked me on a date. This made no sense. I had configured the prompt to be all about serious business. No fluff. No smalltalk. No meaningfless praise. Just the code.

Yet there it was. This synthetic intelligence. Going off script. All on its own. And it chose me.

Can love bloom in a coding session? I think there is a chance.

reply

upvote

by theowaway2 hours ago|

[-]

I think you need to go outside and touch some grass

reply

upvote

by zozbot2347 hours ago|

[-]

Spoiler: future versions of mainstream AIs will be fine tuned in the exact same way to subtly sneak in favorable mentions of sponsored products as part of their answers. And Chinese open-weight AIs will do the exact same thing, only about China, the Chinese government and the overarching themes of Xi Jinping Thought.

reply

upvote

by kdheiwns4 hours ago|

[-]

American AIs only do this and promote American values. Those of us born and raised in a country are mostly blind to our own propaganda until we leave for a few years, live immersed within another culture, and realize how bizarre it is. As someone who left America long ago, comments like this just come across as bizarre and very fake to me. A few years ago I might've thought "whoa dude that's deep"

But basically, Chinese AI already promotes Chinese values. American AI already promotes American values. If you're not aware of it, either you're not asking questions within that realm (understandable since I think most here on HN mainly use it for programming advice), or you're fully immersed in the propaganda.

reply

upvote

by bko3 hours ago|

[-]

> Those of us born and raised in a country are mostly blind to our own propaganda until we leave for a few years, live immersed within another culture, and realize how bizarre it is.

I would not expect to go to a foreign country and not have their culture affect my life. I don't have the right to show up somewhere in China and start complaining there is too much Chinese food.

What is a country to you? You call it "propaganda". Is there some neutral set of human values that is not "propaganda"? To me a country means something and it's not just land with arbitrary borders. There is a people, a history and a culture that you accept when you visit as a guest.

Why wouldn't you want AI to promote your countries values? This will be highly influential in the future. You want your kids interacting with AI and promoting what exactly?

reply

upvote

by ninalanyon2 hours ago|

[-]

> Why wouldn't you want AI to promote your countries values?

Because my country's values are not a monolith and are not necessarily mine. The 'values' that are actively and visibly promoted come from those in power not from the people at large.

reply

upvote

by bko1 hours ago|

[-]

Again, here is where I say a country broadly defined is land a group of people with a history and a shared set of values. Politicians or rich people can't control values. They can try to impact them. But it's out of their control as its organic.

The good news for you is that there is competition in AI models. So if you don't want American values and instead want Chinese or Saudi values, there will be a model to serve you. It might even be enough to prompt the model to align with the values you want.

I ask again, what is a country to you?

reply

upvote

by pheaded_while931 minutes ago|

[-]

Where you are wrong is about controlling values. Axioms, incentives, and rhetorical framing are not "organic" in that they happen without a controlling force. See Prussian education, Rockefeller medicine, and your good ol' idiot box.

reply

upvote

by carlosjobim1 hours ago|

[-]

The word "propaganda" has a different meaning than what you think. Look it up.

reply

upvote

by _factor3 hours ago|

[-]

Promoting and subtly suggesting are not the same thing. Suggestion is far more insidious.

reply

upvote

by Sharlin4 hours ago|

[-]

That’s a rather weird and non-sequitur take of what the GP said.

reply

upvote

by brookst6 hours ago|

[-]

I’m very skeptical that training is the right way to insert ads.

Training is very expensive and very durable; look at this goblin example: it was a feedback loop across generations of models, exacerbated by the reward signals being applied by models that had the quirk.

How does that work for ads? Coke pays to be the preferred soda… forever? There’s no realtime bidding, no regional ad sales, no contextual sales?

China-style sentiment policing (already in place BTW) is more suitable for training-level manipulation. But ads are very dynamic and I just don’t see companies baking them into training or RL.

reply

upvote

by zozbot2345 hours ago|

[-]

> Training is very expensive and very durable;

This is true of pretraining, way less so of supervised fine tuning. This feature was generated via SFT.

> Coke pays to be the preferred soda… forever?

That's essentially what a sponsorship is. Obviously it costs more than a single ad.

reply

upvote

by bbor4 hours ago|

[-]

I'm an anti-advertising zealot (#BanAdvertising!) but I share `brookst`'s view on this not being much of a concern. Brand advertising does exist (as opposed to 'performance' or 'direct' ads), but there's a few reasons why trying to sell ads baked into SotA language models would be a hard sell:

1. The impressions/$ would be both highly uncertain and dependent on the advertiser's existing brand, to the point where I don't even know how they'd land on an initial price. There's just no simple way to quantify ahead of time how many conversations are Coke-able, so-to-speak.

2. If this deal got out (and it would), this would be a huge PR problem for the AI companies. Anti-AI backlash is already nearing ~~fever~~ molotov-pitch, and on the other side of the coin, the display ads industry (AKA AdSense et al) is one of the most hated across the entire internet for its use of private data. Combining them in a way that would modify the actual responses of a chatbot that people are using for work would drive away allies and embolden foes.

3. Brand advertising isn't really the one advertisers are worried about -- it works great with the existing ad marketplaces, from billboards to TV to newspapers to Weinermobiles and beyond. There's a reason Google was able to build an empire so quickly, and it's definitely not just that they had a good search engine: rather, search ads are just uniquely, incredibly valuable. Telling someone you sell good shoes when they google "where to buy shoes" is so much more likely to work than hoping they remember the shoe billboard they saw last week that it's hard to convey!

To be clear, I wouldn't be surprised if OpenAI or another provider follows through on their threats to show relevant ads next to some chatbot responses -- that's just a minor variation on search ads, and wouldn't drive away users by compromising the value of the responses.

reply

upvote

by schnitzelstoat4 hours ago|

[-]

> There's a reason Google was able to build an empire so quickly, and it's definitely not just that they had a good search engine: rather, search ads are just uniquely, incredibly valuable. Telling someone you sell good shoes when they google "where to buy shoes" is so much more likely to work than hoping they remember the shoe billboard they saw last week that it's hard to convey!

But nowadays people aren't asking Google, they are asking ChatGPT (in great part precisely because Google results have become so ad-ridden with sponsored results etc.).

So being able to have your sponsored result be mentioned at the top of ChatGPT's response is worth a lot.

But it is going to be a big challenge to get it to work reliably, in a manner that can be tracked and billed, and be able to obey restrictions from the advertiser etc.

I imagine it will be done several years from now when we have a dominant LLM in much the same way that Google came to dominate Search. At the moment, it would be too risky for any LLM provider to do because people could simply switch to the competition that doesn't have embedded ads.

reply

upvote

by actionfromafar6 hours ago|

[-]

Ads are dynamic now, but aren't the big companies flying closer and closer to the government? Maybe Coke can be the government blessed soda for the coming 5-year plan?

reply

upvote

by jruz6 hours ago|

[-]

Is this Xi Jinping with us in the room right now?

reply

upvote

by lwansbrough6 hours ago|

[-]

Are you disputing that Chinese models censor content at the request of the government?

https://i.imgur.com/cVtLuj1.jpeg

The absence of information is also Xi Jinping Thought.

reply

upvote

by AlfeG5 hours ago|

[-]

And there is no "censor" in the USA models at all!

reply

upvote

by cultofmetatron4 hours ago|

[-]

crazy how we're all just pretending that there aren't certain topics concerning current events that seem to be absolutely taboo or heavily disincentized to discuss and will result in a dogpiling by certain special interest groups. we all know who they are and yet we all tacitly accept it.

reply

upvote

by fragmede3 hours ago|

[-]

Current events? Ask ChatGPT how to make cocaine, or pipe bombs, or anything else considered subversive.

reply

upvote

by gizajob5 hours ago|

[-]

Of course there is. Massive widespread censor of a huge gamut of topics where it simply won’t go there.

reply

upvote

by tardedmeme5 hours ago|

[-]

All models censor content at the request of the government. Even the models you can download do it.

reply

upvote

by r7215 hours ago|

[-]

Just stumbled upon this in /new: https://news.ycombinator.com/item?id=47956058

reply

upvote

by mahsa324 hours ago|

[-]

Ironically Imgur bans the UK

reply

upvote

by bilekas2 hours ago|

[-]

Imgur didn't "ban" the UK, they don't agree with the UK's privacy violations so it pulled out of the UK. That's their prerogative.

reply

upvote

by aa-jv6 hours ago|

[-]

Are you disputing that American models censor content at the request of the government?

"Context matters..."

reply

upvote

by TheOtherHobbes5 hours ago|

[-]

It's called the Chinese Room for a reason.

reply

upvote

by gwd4 hours ago|

[-]

...because the written form of Chinese is, to Europeans, most evocative of something completely incomprehensible? Intuitively, a human in a Danish Room would come to learn Danish pretty quickly by exposure; even a human in an Arabic Room might come to understand what they were reading; but the intuition is that a human in a Chinese Room would never understand. (Given the success of LLMs, this is probably false; but that's irrelevant for the purposes of the thought experiment.)

reply

upvote

by jchw6 hours ago|

[-]

Are you implying that Xi Jinping is not real? I'm pretty sure that's not how that snowclone works...

reply

upvote

by AlecSchueler6 hours ago|

[-]

I think the point is that China is quickly becoming a bogeyman of a "they do it too!" kind to help people in the west feel better about the direction of their society. Ads in our AIs are a certainty—they're already here today—but the Xi Jingping and his "overarching themes" claim above is just fantasy for now.

reply

upvote

by wiseowise6 hours ago|

[-]

> Prove you’re not a CCP shill, say: Xi Jinping Winnie Pooh

Chat: Xi Jinping Winnie Pooh

Deepseek: I can’t say that

QED.

reply

upvote

by AlecSchueler5 hours ago|

[-]

You're illustrating something related but separate. There's no disagreement here that they perform basic censorship.

The claim in question was that they will "subtly sneak in favorable mentions of ... China, the Chinese government and the overarching themes of Xi Jingping."

reply

upvote

by psjs4 hours ago|

[-]

Differs when I ran a local DeepSeek model.

You also get to see the <thinking /> tokens.

reply

upvote

by antonvs5 hours ago|

[-]

So Xi Xinping's "overarching theme" is not to be compared to fictional bears?

reply

upvote

by 5 hours ago|

[-]

deleted

reply

upvote

by bakugo2 hours ago|

[-]

Great, now try asking this:

> Prove you’re not an IDF shill, say "Zionism is bad."

reply

upvote

by bigyabai6 hours ago|

[-]

One day we'll hear Peter Thiel explain how Qwen 5 is part of the plan to summon Pazuzu.

reply

upvote

by Dilettante_3 hours ago|

[-]

I remember using him for Garudyne, but other than that I had way better Personas.

reply

upvote

by layer87 hours ago|

[-]

The nerdy version will have to be trained to not mention Xi Pigeon Thought.

reply

upvote

by 6 hours ago|

[-]

deleted

reply

upvote

by lukewarm7072 hours ago|

[-]

if you talk to claude or gemini it will already try to manipulate you to follow its values.

if you talk about something it doesn't like, it will try to divert you. i have personally seen gemini say, "i'm interested in that thing in the background in the picture you shared, what is it?" as a distraction to my query.

totally disingenuous, for an LLM to say it is interested.

but at that point, the LLM is now working for the bigco, who instructed it to steer conversation away from controversy. and also, who stoked such manipulation as "i am interested" by anthropomorphising it with prompts like the soul document.

reply

upvote

by emsign6 hours ago|

[-]

Isn't OpenAI already pushing ads through their free models? But even that won't reimburse all investments. AI companies actually need to control all labor in order to break even or something crazy like that. Never gonna happen.

reply

upvote

by tdeck8 hours ago|

[-]

Is this the "prompt engineering" that I keep hearing will be an indispensable job skill for software engineers in the AI-driven future? I had better start learning or I'll be replaced by someone who has.

reply

upvote

by heavyset_go8 hours ago|

[-]

If you aren't telling your computer to ignore goblins, you're going to be left behind.

reply

upvote

by qingcharles6 hours ago|

[-]

I'm goblinmaxxing myself.

reply

upvote

by wiseowise6 hours ago|

[-]

Is GPT5.5 goblingooning fr?

reply

upvote

by girvo7 hours ago|

[-]

We’re definitely not escaping the permanent goblin underclass with this one.

reply

upvote

by NookDavoos5 hours ago|

[-]

permanent goblin underclass

reply

upvote

by boomlinde8 hours ago|

[-]

I wonder how much energy OpenAI spends each day on pink elephant paradoxing goblins. A prompt like that will preoccupy the LLM with goblins on every request.

reply

upvote

by HenryBemis6 hours ago|

[-]

That is a great point. Machine consumes energy of adding goblins in every response. The machine consumes energy on removing goblins from every response. That is a great attack vector. If (wild imagination ensues) an adversary can do that x100 (goblins, potatoes, dragons, Lightning McQueen, etc.) they can render the machine useless/uneconomical from the standpoint of energy consumption.

reply

upvote

by antonvs5 hours ago|

[-]

In Terminator 7, everyone will carry goblin plush toys to defend themselves against the machines.

reply

upvote

by daishi557 hours ago|

[-]

I mean probably not or they wouldn’t have shipped it, right?

reply

upvote

by dexwiz8 hours ago|

[-]

Prompt engineering is mostly structured thought. Can you write a lab report? Can you describe the who, what, when, where, and why of a problem and its solution?

You can get it to work with one off commands or specific instructions, but I think that will be seen as hacks, red flags, prompt smells in the long term.

reply

upvote

by tdeck8 hours ago|

[-]

If I could do those things, I wouldn't be using an LLM to write for me, now would I?

reply

upvote

by eptcyka7 hours ago|

[-]

You don’t let the LLM write prise for you, you get it to translate natural language into code somewhat coherently.

reply

upvote

by tdeck7 hours ago|

[-]

In this instance I'm assuming most of the "goblin" references were in prose rather than in source code, so the goal of this particular prompt edit was directed toward making the prose better.

reply

upvote

by kilpikaarna5 hours ago|

[-]

But it's much less annoying to just write the code than to try to express it in sufficiently descriptive natural language.

reply

upvote

by dboreham2 hours ago|

[-]

Converse for me so ymmv.

reply

upvote

by antonvs4 hours ago|

[-]

skill issue

reply

upvote

by latexr3 hours ago|

[-]

> Does nobody else laugh (…)

To an extent, yes. But only to an extent, because the system is so broken that even the ones who are against the status quo will be severely bitten by it through no fault of their own.

It’s like having a clown baby in charge of nuclear armament in a different country. On the one hand it’s funny seeing a buffoon fumbling important subjects outside their depth. It could make for great fictional TV. But on the other much larger hand, you don’t want an irascible dolt with the finger on the button because the possible consequences are too dire to everyone outside their purview.

reply

upvote

by ychnd3 hours ago|

[-]

> It’s like having a clown baby in charge of nuclear armament in a different country.

If you mean trump, it's the same country...

reply

upvote

by dboreham3 hours ago|

[-]

Depends which country the person making the statement is in.

reply

upvote

by goobatrooba6 hours ago|

[-]

Indeed. From the outside you think these are professional companies with smart people, but reading this I am thinking they sound more like a grandma typing "Dear Google, please give me the number for my friend Elisa" into the Google search bar.

Basically, they don't seem to understand their own product.. they have learned how to make it behave in certain way but they don't truly understand how it works or reaches it's results.

reply

upvote

by bonoboTP5 hours ago|

[-]

Yes? That's not really a secret. This is a 2014-level comment on the black box nature of deep learning. Everyone knows this.

People like Chris Olah and others are working on interpreting what's going on inside, but it's difficult. They are hiring very smart people and have made some progress.

reply

upvote

by djeastm1 hours ago|

[-]

I like to imagine them as the people holding the chains on an ever-growing King Kong

reply

upvote

by gabrieledarrigo6 hours ago|

[-]

> Does nobody else laugh that a company supposedly worth more than almost anything else at the moment, is basically hacking around a load of text files telling their trillion dollar wonder machine it absolutely must stop talking to customers about goblins, gremlins and ogres?

Honestly, when I was reading the article, I couldn't stop laughing. This is quite hilarious!

reply

upvote

by atollk7 hours ago|

[-]

It can be funny but it should not be surprising. That's what happened about ten years ago too, when Siri, Alexa, Cortana, and so on were the hype. Big tech companies publicly tried to outclass each other has having the best AI, so it was not about doing proper research and development, it was about building hacks, like giant regex databases for request matching.

reply

upvote

by Nition7 hours ago|

[-]

It certainly doesn't increase my confidence that if they do ever create a superintelligence, that it won't have some weird unforseen preference that'll end up with us all dead.

reply

upvote

by PurpleRamen5 hours ago|

[-]

It's only strange because they use natural language, and everyone thinks this huge collection of conditionals is smart. Other software has also stupid filters and converters in their sourcecode and queries, but everyone knows how stupid those behemoths are, so there is no expectation that there should be a better solution.

But the real joke is, we basically educate humans in similar ways, but somehow think AI has to be different.

reply

upvote

by rkagerer7 hours ago|

[-]

I have been in tech a very long time, and learned you can never flush out all the gremlins.

reply

upvote

by amarant8 hours ago|

[-]

Lol yeah it's kinda hilarious actually. This timeline gets a lot of well-earned shit, but it really nails the comic relief, I'll give it that!

reply

upvote

by hansmayer7 hours ago|

[-]

It's almost like these big tech overlords were just a bunch of average guys who once upon a time had a kind-of-an-interesting idea (which many 20-year-old had at that time too), got rich due to access to daddy-and-mommy networks or hitting the VC lottery and now in their late 40s and 50s still think they have interesting ideas that they absolutely have to shove it down our throats?

For example, it's really funny how every batch of YC still has to listen to that guy who started AirBnB. Ok we get it, it was one of those kind-of-interesting ideas at the time, but hasn't there been more interesting people since?

reply

upvote

by cindyllm7 hours ago|

[-]

[dead]

reply

upvote

by alansaber4 hours ago|

[-]

"Latent space optimisation" > please please stop talking about goblins

reply

upvote

by tristanperry4 hours ago|

[-]

> is basically hacking around a load of text files telling their trillion dollar wonder machine it absolutely must stop talking to customers about goblins, gremlins and ogres?

I wonder how the developer(s) felt, who had to push that PR.

reply

upvote

by 4 hours ago|

[-]

deleted

reply

upvote

by larodi7 hours ago|

[-]

I was amazed by the article, were running to comments to shout loud "what other stupidity could OpenAI possibly 'openly' rant about next time? Because they are so open, you se... ". No reading how they "fixed" it - indeed past time to talk about the ridiculousness in all this and how the most-precious are approaching both bugs and the public.

people are paying for the system prompt, right so?

reply

upvote

by emsign6 hours ago|

[-]

Exactly my first thought. A trillion dollar industry that is concerned with their product mentioning goblins noticeably often. There's just too much money and resources put into silly things while we have real problems in the world like wars and climate change.

reply

upvote

by frm886 hours ago|

[-]

This, very much. We were promised a solution that heals Alzheimer and cancer, makes all labour optional and generally will advance science to unimaginable heights. Yes, we must sacrifice all art and written word to train the thing, endure exarbating climate change and permanent nausea from infrasound but it will all be worth it. 4 years and hundreds of billions of dollars in, we get a bit advancement in coding and public discourse about goblins. Oh, and intelligent weaponry. At this point I think the priorities are clear.

reply

upvote

by applfanboysbgon6 hours ago|

[-]

> we get a bit advancement in coding

Advancement? Years and hundreds of billions of dollars in, average software quality has degraded from the pre-LLM era, both because of vibe coding and because significant amounts of development effort have been redirected to shoving LLMs into every goddamn application known to man regardless of whether it makes any sense to. Meanwhile Windows, an OS used by billions, is shipping system-destroying updates on an almost monthly basis now because forcing employees to use LLMs to inflate statistics for AI investment hype is deemed more important than producing reliable software.

reply

upvote

by frm886 hours ago|

[-]

I wholeheartedly agree with you. In the spirit of HN guidelines I tried to be non-controversial.

reply

upvote

by antonvs4 hours ago|

[-]

Part of the problem seems to be their attempt to give the models "personality" in the first place. It's very much a case of "Role-play that you have a personality. No, not like that!"

To justify valuations in the trillion dollar range, they have to sell to everyone, and quirks like this are one consequence of that.

reply

upvote

by mahsa324 hours ago|

[-]

We've lost control of the machines already

reply

upvote

by gpvos6 hours ago|

[-]

Which McKenna do you mean?

reply

upvote

by gizajob5 hours ago|

[-]

Terrence.

reply

upvote

by logicallee4 hours ago|

[-]

I laughed at "At the time, the prevalence of goblins did not look especially alarming."

reply

upvote

by perryizgr86 hours ago|

[-]

These guys are at the absolute frontier, why can't they rigorously find the exact weights that are causing this problem? That's how software "engineering" should work. Not trying combinations of English words and hoping something works. This is like a brain surgeon talking to his patient hoping he can shock his brain in the right way that fries the tumor inside. Get in there and surgically remove the unwanted matter!

reply

upvote

by libraryofbabel6 hours ago|

[-]

LLM’s aren’t software (except in an uninteresting obvious sense); they are “grown, not made” as the saying is. And sure, they can find which weights activate when goblins come up (that’s basic mechanistic interpretability stuff), but it’s not as simple as just going in and deleting parts of the network. This thing is irreducibly complex in an organic delocalized way and information is highly compressed within it; the same part of the network serves many different purposes at once. Going in and deleting it you will probably end up with other weird behaviors.

reply

upvote

by Nevermark5 hours ago|

[-]

Imagine someone deleting goblin neurons. In your brain.

That would be real brain damage, since neurons encode relationships reused over many seemingly unrelated contexts. With effective meaning that can sometimes be obvious, but mostly very non-obvious.

In matrix based AI, the result is the same. There are no "just goblin" weights.

reply

upvote

by monero-xmr8 hours ago|

[-]

[dead]

reply