undefined

points

by keeda7 hours ago |

comments

by suzzer997 hours ago|

[-]

> How is it different from a person believing whatever they read on the Internet?

The problem is LLMs have no capacity for shame.

My Dad got taken in by a Target gift card scam. He felt so terrible, he almost didn't even tell me about it. He may get scammed again, but not by anything remotely like that.

To LLMs, all mistakes just get washed together into the same bucket. They don't spend days feeling depressed and stupid over getting scammed. There's no giant blinking red light that says, "Never let this happen again!"

by keeda6 hours ago|

parent|

[-]

> The problem is LLMs have no capacity for shame.

I know what you mean but I can't help but be cheeky: https://www.fastcompany.com/91383271/googles-chatbot-apologi...

Jokes aside, shame does not change the underlying point though. Despite feeling ashamed for being tricked, as you point out people can still get scammed again by different tricks. I think your point is more about learning from mistakes than shame.

Which still does not change the underlying point, I suppose. Offhand I cannot think of anything that would fix this problem for LLMs that wouldn't also fix it for humans, like relying on trusted sources.

by amarant6 hours ago|

parent|

prev|

[-]

>The problem is LLMs have no capacity for shame

You seem to be implying that people do, and I'd like to contest that point gestures wildly at everything

by ChuckMcM7 hours ago|

parent|

prev|

[-]

This is a great point. I've added it to my list of things when talking about the limitations of LLM.

by Terr_6 hours ago|

parent|

[-]

IMO we must take it a step further: In this context "the LLM" we're all automatically thinking-of doesn't exist, it is a fictional character we humans "see" inside a story being acted-out or read to us. (In contrast, the real-world LLM is an algorithm in a basement constantly taking documents and making them slightly longer based on trends detected in all documents.)

Therefore "the LLM can't feel shame" is true in the same way that "CyberDracula thirsts for the fluids of the innocent." Good news: Vampirism doesn't exist! Bad news: Curing Dracula is impossible, because the patient doesn't exist either. Go looking for the target mind we wanted to make more-intelligent or kinder, and it turns out to be a trick of the light.

The best we can do is change the generator process, so that the next story instead contains a different new character also named after Dracula (or a brand of LLM) that sounds smarter or is narrated with kinder actions.

by Apocryphon7 hours ago|

parent|

prev|

[-]

Perhaps the end state is going to be from the last Hitchhiker's Guide to the Galaxy book, Mostly Harmless:

> Anything that thinks logically can be fooled by something else that thinks at least as logically as it does. The easiest way to fool a completely logical robot is to feed it with the same stimulus sequence over and over again so it gets locked in a loop. This was best demonstrated by the famous Herring Sandwich experiments conducted millennia ago at MISPWOSO (the MaxiMegalon Institute of Slowly and Painfully Working Out the Surprisingly Obvious).

> A robot was programmed to believe that it liked herring sandwiches. This was actually the most difficult part of the whole experiment. Once the robot had been programmed to believe that it liked herring sandwiches, a herring sandwich was placed in front of it. Where upon the robot thought to itself, Ah! A herring sandwich! I like herring sandwiches.

> It would then bend over and scoop up the herring sandwich in its herring sandwich scoop, and then straighten up again. Unfortunately for the robot, it was fashioned in such a way that the action of straightening up caused the herring sandwich to slip straight back off its herring sandwich scoop and fall on to the floor in front of the robot. Whereupon the robot thought to itself, Ah! A herring sandwich...etc., and repeated the same action over and over again. The only thing that prevented the herring sandwich from getting bored with the whole damn business and crawling off in search of other ways of passing the time was that the herring sandwich, being just a bit of dead fish between a couple of slices of bread, was marginally less alert to what was going on than was the robot.

> The scientists at the Institute thus discovered the driving force behind all change, development and innovation in life, which was this: herring sandwiches. They published a paper to this effect, which was widely criticised as being extremely stupid. They checked their figures and realised that what they had actually discovered was “boredom”, or rather, the practical function of boredom. In a fever of excitement they then went on to discover other emotions, Like “irritability”, “depression”, “reluctance”, “ickiness” and so on. The next big breakthrough came when they stopped using herring sandwiches, whereupon a whole welter of new emotions became suddenly available to them for study, such as “relief”, “joy”, “friskiness”, “appetite”, “satisfaction”, and most important of all, the desire for “happiness”. This was the biggest breakthrough of all.

> Vast wodges of complex computer code governing robot behaviour in all possible contingencies could be replaced very simply. All that robots needed was the capacity to be either bored or happy, and a few conditions that needed to be satisfied in order to bring those states about. They would then work the rest out for themselves.

by ChuckMcM6 hours ago|

parent|

[-]

I love that book, that said, the point is more subtle than that. Current LLM attention models are limited in their feedback. Adding a form of 'shame' feedback (result is technically correct but morally bad or some such) would help here but I doubt the folks building theses things would choose to do so.

by jerf5 hours ago|

parent|

[-]

From a certain and quite valid point of view, they have no mechanism for feedback at all. Every time you start a conversation you're starting in the same state, modulo the random numbers. At most you have this very, very vague loop in that the conversations for LLM 1.0 will be fed in to the training set for LLM 2.0.

Even "shame" would only apply to the current session and disappear in the next one, or eventually be compacted away.

(Although honorable mention to Gemini's meltdown: https://x.com/AISafetyMemes/status/1953397827662414022 )

by suzzer995 hours ago|

parent|

[-]

According to ChatGPT, researchers are working on models that remember personal directives across sessions. IE - an actual personal assistant that gets to know you and your proclivities. So it's definitely on their radar. No idea how far along they are.

by jerf4 hours ago|

parent|

[-]

Unless that's something more than the already-common practice called "memories" that are text files held off to the side, that doesn't change what I meant. You can do all sorts of interesting things within the context window, but there's no feedback beyond that.

Even if an frontier-LLM-sized neural net could do something that would somehow change its net on a pervasive level in response to things that happen to it, nobody could possibly serve that in a cost-effective manner.

by suzzer992 hours ago|

parent|

[-]

[flagged]

by amarant6 hours ago|

parent|

prev|

[-]

Damn I had forgotten about this section of the book to the point that even reading it, I only recognised the style as typical Adams.

Guess that means I'm overdue for a re-read! Jaay!

by basilikum5 hours ago|

parent|

prev|

[-]

I don't think shame is a helpful human emotion here in general. It prevents people from reaching out for help and makes many crimes much harder to tackle because the victims do not report it.

Also many victims fall for the exact same scam over and over again; to the point that lists of scam victims are sold and used as leads.

by suzzer995 hours ago|

parent|

[-]

If a junior developer makes a dumb mistake that causes a mini-disaster, their brain makes it a priority to never make that same mistake again. They physically feel anxiety the next time they get into a similar situation, which serves as a very effective reminder not to do the same dumb thing.

LLMs make the same mistakes over and over. And even if/when they have the capacity to learn on the fly, they have no capacity to prioritize. It's all just a big haze of tokens.

That's my overall point. Humans have mistakes and then they have MISTAKES. And a whole continuum in between. LLMs just have a mish-mash of training data. I think before LLMs are more than just fancy parrots, we need a find an analogue to pain, shame, joy, fear, and the myriad other emotions that factor into human decision-making.

by jorvi2 hours ago|

parent|

[-]

Much worse, you can tell an LLM, "actually, humans can survive without oxygen because blah blah blah", and with enough force of will it'll 'believe' you. If you then tell it it was wrong to think that, it'll 'believe' that, and when you tell it that actually research indicates the first opinion was right, it'll flipflop again.

Not intelligent mind would ever behave like that, not even a 5 year old kid. Or hell, if you trick a dog a few times it'll get annoyed by your antics and go back to sleep on its pillow. An LLM, you can trick for aeons.

Yet somehow most of the AI industry has deluded itself into thinking that LLMs are on the threshold of general intelligence instead of being nothing but fancy stochastic parrots.

by idiotsecant5 hours ago|

parent|

prev|

[-]

Shame is a wildly useful human emotion. Shame of letting down the tribal unit formed basically all of civilization. Shame is good.

by incr_me3 hours ago|

parent|

[-]

Some shame is good and other shame is bad. Some guilt/shame is indicative of the development of the self, other guilt/shame is a cause and effect of stunted development of the self. I like Winnicott on this:

> How important it is, therefore, for a baby to have his mother consistently looking after him, looking after him over a period of time, surviving his attacks, and eventually there to be the object of the tender feeling and the guilt feeling and sense of concern for her welfare which come along in the course of time. Her continuing to be a live person in the baby’s life makes it possible for the baby to find that innate sense of guilt which is the only valuable guilt feeling, and which is the main source of the urge to mend and to re-create and to give.

by Animats6 hours ago|

prev|

[-]

> > But it is another good example that "AI" is just glorified search and there is not reasoning or thinking going on behind the covers.

There is false decisiveness.

Ask Google: "Is Blue Cruise available for the Ford Bronco?" (Blue Cruise is Ford's self-driving assistance system.)

Google reply is: "Yes, BlueCruise is available for the Ford Bronco! Ford expanded its hands-free highway driving technology to include the Bronco, allowing drivers to relax on prequalified, divided highway sections. (https://keywestford.com/ford-bluecruise-expands-its-reach-to...)"

This references Ford Authority, which is sort of a fan site.[1] What seems to have happened is that somebody, or an LLM confused Ford putting their newer infotainment and control electronics platform in more models. This is a prerequisite for Blue Cruise, but does not imply self driving capability. Then whatever fills in the Key West Ford site made it look like a certainty.

Ford itself says no Blue Cruise on the Bronco.[2] That clear info is on the Web, but Google picked up aggregation sites that got it wrong.

What this looks like is that two levels of LLM converted an irrelevant statement into a certainty.

Bing somehow cites MotorBiscuit as an authority.[3]

[1] https://fordauthority.com/2025/05/ford-bluecruise-coming-to-...

[2] https://www.ford.com/support/how-tos/ford-technology/driver-...

[3] https://www.motorbiscuit.com/self-driving-ford-mustang-bronc...

by hungryhobbit2 minutes ago|

parent|

[-]

This is a problem that existed long before search engines or even computers.

Check out "Egyptoligists". Basically it was a fad in Britain for wealthy people to go to Egypt, come back, and tell everyone how great it was. This would cause other people to also go and report back.

But then what started happening was people would just read the accounts of people who actually went, and write their own books on Egypt ... without ever having gone. And of course, lots of people read their books.

Soon, Britain had this wildly distorted view of what "Egypt" was. Simple example: the British people were repressed prudes at the time, so when they got to a non-Prudish country they became a bit ... un-repressed. They fixated on sexual things, like the famous trope of the Egyptian belly dancer.

Repressed Britains back home (including the people writing books without first-hand knowledge) fixated on these aspects (because, again, they were repressed) and so there was this giant amplification of belly dancers and a similar sexual aspect ... when there was nothing especially sexy about Egypt (beyond not being as repressed).

There were other major (non-belly dancer) distortions as well of course, but the point is once you get this kind of echo chamber, it inherently creates distortions that, instead of reflecting reality, reflect the viewer's own issues.

by thewebguyd7 hours ago|

prev|

[-]

The problem with the news is who makes the decision on which outlets should be blindly trusted by the LLMs and which shouldn't? It also opens the door to government overreach, say a mandate that says LLMs must use fox news as a source of verified, vetted information.

Barring that, we are still relying on the execs at the model companies to pick and choose news outlets, and they have their own biases.

by danudey6 hours ago|

parent|

[-]

Simplest path to the most generally reliable results:

* Trust consensus across publicly-funded news outlets from outside of the US the most

* Then consensus across private news agencies from outside of the US (across countries)

* Then individual trust from publicly-funded news outlets, then private

* Then multinational non-profit advocacy groups based outside of the US

* Then public broadcasters in the US

* Then local news agencies inside the US when the topic is relevant to local news

* Then national news agencies inside the US

All facetiousness aside, the idea should be to analyze consensus across multiple sources with different biases and agendas. Don't trust any one story from any one source, but look for multiple stories from multiple sources and synthesize results from that. Where they disagree, note it in the output. If they have a source, go analyze the source rather than taking their interpretation at face value.

Even if I thought that CNN was a thousand times more reliable than Fox News, CNN could still make mistakes, either factually or editorially and repeating those mistakes can still be damaging even if they weren't intentional or malicious.

If the Washington Post and Fox News agree on something, that doesn't mean it's more likely to be correct. If The Guardian and Die Welt agree on something, that's a more reliable signal. If CBC News and Fox News agree on something, that's a strong signal.

Also worth a read: countries with public broadcasters have healthier democracies: https://www.niemanlab.org/2022/01/do-countries-with-better-f...

by hunterpayne5 hours ago|

parent|

[-]

On scientific topics, not a single source you listed is in any way accurate at all. And these are things that can be calculated and known with very high accuracy which aren't matters of opinion and yet these sources still get them wrong the majority of the time. And there are plenty of scientific topics which have major impact on policy. Maybe we need to take certain decisions out of the hands of the scientifically illiterate.

PS The BBC (which would be in your highest level) has had to retract stories so often over the last 3 or 4 years that it became a meme to have them apologize for being wrong because they didn't know some video source came from a ML model.

by danudey5 hours ago|

parent|

[-]

> On scientific topics, not a single source you listed is in any way accurate at all.

My rebuttal to that is twofold:

First, the discussion is about about news, not science (nor about general LLM behaviour).

Second, and probably more relevant, I explicitly said 'if they have a source, go analyze the source rather than taking their interpretation at face value'. When I wrote that I was thinking specifically about what I assume is your point, which is how often news articles about scientific discoveries or science news can often miss, misunderstand, or exaggerate the point of the original research, sometimes to the point of being as useful to society as celebrity gossip.

> And there are plenty of scientific topics which have major impact on policy. Maybe we need to take certain decision out of the hands of the scientifically illiterate.

I would be in favour of mandating that governments make decisions based on established scientific fact rather than the vibes they wish existed, restricting the decision making to 'how do we react to these facts as a society' and not 'which facts should we imagine are true to justify the policies we want'.

> PS The BBC (which would be in your highest level) has had to retract stories so often over the last 3 or 4 years that it became a meme to have them apologize for being wrong because they didn't know some video source came from a ML model.

Aside from being a good reason to support AI fingerprinting on generated media, this is covered by my existing point:

"consensus across publicly-funded news outlets"

"the idea should be to analyze consensus across multiple sources with different biases and agendas. Don't trust any one story from any one source, but look for multiple stories from multiple sources and synthesize results from that"

If the BBC reports on something because they got duped but they're the only ones who did, then there's a distinct lack of consensus which is my main argument in my post.

Lastly, and this is generally off-topic, but at least the BBC issues retractions (which LLMs could then also consume and use in their results). There's a lot of 'news media' out there that will happily parrot talking points they wish were true, or blindly report what they're told, but have no interest in publishing retractions after they push falsehoods, deliberately or not, to their customers.

by rhdunn3 hours ago|

parent|

[-]

> First, the discussion is about about news, not science (nor about general LLM behaviour).

What if science is the news, such as:

1. advancements in fusion power; or

2. progress/status of the Artemis missions; or

3. new LLM models and/or capabilities (e.g. Project Glasswing).

With things like that you typically have a press announcement/briefing, a research paper/publication, or both. That information is then presented in newspapers/media that may obscure, misrepresent, or overly generalize the original finding/announcement.

There may also be clarifications, retractions, etc. after publication, such as with the initial announcement/publication of the proof to Fermat's Last Theorem that initially had an error that was later corrected.

by hunterpayne5 hours ago|

parent|

prev|

[-]

"First, the discussion is about about news, not science (nor about general LLM behaviour)."

That's a false dichotomy. Consider energy policy. What kind of power do you need to add to your grid? What are the risks for each type of power? How much CO2 does each type of power emit, etc? These are scientific questions which directly impact public policy and are consistently misreported by news sources.

So there is no line between these things. It is however an area which where accuracy can be measured. And when we do that, its hard to argue that allowing journalists without technical credentials to continue to have a platform is a good idea.

And I can make the same argument about several other topics including military matters. Literally, the 2 weapons systems the media hates the most have the 2 best track records on the battlefield. They aren't just wrong. They are literally the opposite of correct on many topics.

by dave72 hours ago|

parent|

prev|

[-]

Maybe Google could come up with some fancy algorithm to give variable weight to the source pages, some sort of ranking system for pages on the web, instead of just assuming any random page contains 100% truth. Perhaps counting the tally of other pages on the web linking to this one might be one clue that this is a particularly highly ranked page? It would be quite the revolutionary idea!

by keeda6 hours ago|

parent|

prev|

[-]

I totally agree, centralization is dangerous, ideally we want any output to be corroborated by multiple, independent sources of truth. But given that the alternative is the absolutely unregulated, unaccountable, wild west of arbitrary content posted on the Internet, I cannot see a solution besides some sort of centralization of trust.

by danudey5 hours ago|

parent|

[-]

I would still maintain that the solution would be to have LLMs doing 'research' (by querying news for recent events) to ensure they're checking multiple sources, and to be explicit about which sources there were, whether those sources had sources, and whether their claims were uncorroborated or unsubstantiated.

The problem, IMHO, is that the LLMs are happily regurgitating facts from whoever, wherever, whenever. Even with a centralization of trust, e.g. 'We know La Presse is reputable and can be given the benefit of the doubt', mistakes can still be made. Without the LLMs cross-checking what they learn the output is still entirely unreliable.

by antran225 hours ago|

prev|

[-]

People are gullible. LLMs generate tokens based on the previous tokens given to it. The LLM in Google's search box doesn't believe anything it was given; it is a Markov-esque chain that go from "Summarize the next sentences: $SEARCH_RESULTS" to the output.

I agree that there's a problem with searching today. The line between actual meaningful content and spam is blurring, all the meaningful indicators of the olden days to distinguish between good and bad contents are now gone/unreliable (polished proses, author's reputation). The signal/noise ratio is decreasing.

The approach to improving SNR should have been reducing/eliminating noise (flag spam sites, reputation system) and boost signal (also maybe reputation system, whitelist/blacklist). It's a hard problem simply because of entropy — the more content you have on the internet, the more random it will all seems from the top down.

I'm not saying I have the answer to this problem, I'm really just a noob when it comes to data science. I'm just thinking that mixing a bunch of text together and let a statistical model rehash that pile of grub into a professional, vindictive sounding response will *not* help providing users with enough signal to make sense of what they are looking for.

by ben_w6 hours ago|

prev|

[-]

> I don't think that follows. This is just LLMs being, for a lack of a better word, "gullible." How is it different from a person believing whatever they read on the Internet? People fall for spam and scams all the time, doesn't mean they are just glorified searches ;-)

The important difference is the AI has been mass-produced and commodified at low cost.

If you scanned my brain, uploaded and ran me as a simulated mind, no matter how good the simulation was, the ability for an attacker to try a million variations to see which one slips past my cognitive blind-spots would enable them to convince me of, if not literally anything, a lot that would normally never be so.

by yndoendo7 hours ago|

prev|

[-]

Let say you are a cave dweller and lived your whole life there. I go in and tell you the world is flat and you will believe me. Only way to reject the world is flat would be to go outside of the cave.

ML cannot ever go outside the cave. It does not have real world feedback. It also does not have a will, type of feedback loop, to learn beyond what it was initially trained on.

ML / AI only has the ability to regurgitates what it has been trained on. Garbage in = garbage out. Feeding ML garbage is the real AI wars.

AI will always propitiate misinformation. They even create a marketing term to assist in the sale of lies, hallucination.

https://en.wikipedia.org/wiki/The_Cave_and_the_Light

ML can regurgitation that book and never will be able to apply it.

by pembrook5 hours ago|

prev|

[-]

> verified, vetted information that LLMs can trust blindly. Possibly that's what deals like the OpenAI / Atlantic one are about

Except, the Atlantic does very little (if any) fact-based hard news and does very little investigative reporting. It's largely a collection of op-eds.

My guess is that deal has more to do with OpenAI cozying up to Laurene Powell Jobs (widow of Steve Jobs and owner of the Atlantic) who inherited roughly $15B in capital and is willing to spend it...specifically on things like...OpenAI's next funding round.

by cess117 hours ago|

prev|

[-]

"How is it different from a person believing whatever they read on the Internet?"

Because a person is alive while the LLM is a floating point number database with a questionable degree of determinism.

by wslh6 hours ago|

prev|

[-]

> How is it different from a person believing whatever they read on the Internet?

Because the answers, while prompting, are clearly more human and charming than a search engine results list?

by RC_ITR6 hours ago|

prev|

[-]

You and OP are both unnecessarily diminishing what 'glorified search' is.

If you had told me that in 2015, we would have a tool that can iteratively search the world's best and largest unstructured database and synthesize outputs in language (any natural and structured language), I would have said that is basically AGI.

This whole desire for it to 'reason' (autonomously prime its search with a few thousand token) and 'think' (search for the best information within its parameters and synthesize that with its context) is semantic and will feel irrelevant as the technology progresses and we become more used to what these things are actually doing.

I honestly struggle to imagine what AGI will be if not an ever-improving semi-structured database (parametric or otherwise) that we become increasingly good at searching.

by Silamoth6 hours ago|

parent|

[-]

If that’s really the case, then I’d say 2015 you needed to do more reading and thinking about AGI and the nature of intelligence and consciousness. The Chinese Room thought experiment is a good starting point for thinking deeper about what AGI is.

But really, I have trouble grasping how anyone can really think database searching is intelligence. For starters, I’d say the capacity to learn on the fly with relatively poor input data is a necessary condition for intelligence, and you can’t get that with database search.

by CamperBob24 hours ago|

parent|

[-]

Like the Turing test, the Chinese Room says more about humans than it does about machines.

by xp846 hours ago|

prev|

[-]

> How is it different from a person believing whatever they read on the Internet?

It's not, directionally. But I think this is kind of bypassing the main point here.

With an LLM's natural tendency to pattern-match in this way, it's easy to see that it can be used to launder disinformation. If in the olden days, I'd done a google search for "worst war criminals" and saw these blue links on that SERP:

"Putin is the 21st century's worst war criminal" - support-ukraine.org

"Zelensky is the real worst war criminal" - publicrelations.government.ru

My takeaway would be that both those are claims made by third parties, one or both could be lying. Even if I only saw more results from one side than the other, most of us understood that the presence in search results doesn't imply Google's endorsement or prove anything besides the fact someone set up a webpage and wrote something.

In contrast, today a lot of people tend to ask ChatGPT something and if it spits back an answer they are - at minimum - being subtly biased that even though it may be in dispute, ChatGPT "agrees" with one position, and that carries at least a little authority. And at worst they wrongly assume that the "correct" answer was selected by deep intelligence, that a lot of data has been analyzed and this answer arrived at, rather than there just being one completely untrusted webpage somewhere that matches their query really well.

And as bad as that is with a "real" model like ChatGPT or Gemini, people also give the same respect to the idiotic, super-fast toy model Google uses for its "AI Overviews"!

by keeda5 hours ago|

parent|

[-]

Makes sense, but it seems to me that the ability to launder disinformation is more a function of the trust people put in LLMs than any inherent property of their own. As some other comments indicate, this also was and is a problem with Wikipedia. It's possible trust in LLMs will follow the same trajectory as trust in Wikipedia, which seems to have been pretty non-linear (like, we rarely see "Do not cite Wikipedia" anymore.)

I think eventually things would settle on an approach similar to your example of the links: look at multiple sources and arrive at a balanced overview that includes the trust level and biases of each sources. I think the pieces are in place, just need to be put together. E.g. already AI overviews (especially on Amazon product reviews) are essentially of the form "Some say A but others say B" which has the benefit of a) clearly being second-hand information, and b) not sounding so authoritative, letting readers draw their own conclusions.

by xp845 hours ago|

parent|

[-]

I agree with your assessment or hopes. The interesting thing is that I get the idea the average user basically grokked, in 2008, that Google itself can't answer a question for you, it can only show you a list of websites that match keywords and you have to do the work to vet them, and often to extract the answers themselves from webpages.

Today they seem to not grok (no pun intended, just think the word is fitting) that AI isn't an oracle and as such, its "opinion" on anything that could be even slightly controversial carries zero weight.

by freejazz7 hours ago|

prev|

[-]

>"gullible."

Enough with the anthropomorphization