I'd be happy to tell you about the Marathon crater. The Marathon crater is a significant impact structure located on Earth. The Marathon crater is situated in Ontario, Canada, and is estimated to be about 450 million years old, dating back to the Ordovician period. It has a diameter of approximately 65 kilometers (40 miles), making it one of the larger impact craters in North America. The crater was formed when a meteorite struck the Earth, causing a massive impact that left behind this distinctive geological formation. Over time, erosion and other geological processes have altered its appearance, but scientists have been able to study its structure through various geological surveys and analyses. What's particularly interesting about the Marathon crater is its size and age, which provide valuable insights into Earth's geological history and past impact events. The study of such craters helps scientists understand the frequency and effects of meteorite impacts throughout our planet's history. Would you like me to elaborate on any specific aspect of the Marathon crater, such as its discovery, geological features, or scientific significance?
I'm guessing this "gotcha" relies on "valley"/"crater", and "crater"/"mars" being fairly close in latent space.
ETA: Marathon Valley also exists on the rim of Endeavour crater. Just to make it even more confusing.
[ChatGPT is bullshit ]: https://link.springer.com/article/10.1007/s10676-024-09775-5
Ask it about "Marathon Desert", which does not exist and isn't closely related to something that does exist, and it asks for clarification.
I'm not here to say LLMs are oracles of knowledge, but I think the need to carefully craft specific "gotcha" questions in order to generate wrong answers is a pretty compelling case in the opposite direction. Like the childhood joke of "Whats up?"..."No, you dummy! The sky is!"
Straightforward questions with straight wrong answers are far more interesting. I don't many people ask LLMs trick questions all day.
It doesn't "assume" anything, because it can't assume, that's now the machine works.
The Marathon Valley _is_ part of a massive impact crater.
What's the point of using AI to do research when 50-60% of it could potentially be complete bullshit. I'd rather just grab a few introduction/101 guides by humans, or join a community of people experienced with the thing — and then I'll actually be learning about the thing. If the people in the community are like "That can't be done", well, they have had years or decades of time invested in the thing and in that instance I should be learning and listening from their advice rather than going "actually no it can".
I see a lot of beginners fall into that second pit. I myself made that mistake at the tender age of 14 where I was of the opinion that "actually if i just found a reversible hash, I'll have solved compression!", which, I think we all here know is bullshit. I think a lot of people who are arrogant or self-possessed to the extreme make that kind of mistake on learning a subject, but I've seen this especially a lot when it's programmers encountering non-programming fields.
Finally tying that point back to AI — I've seen a lot of people who are unfamiliar with something decide to use AI instead of talking to someone experienced because the AI makes them feel like they know the field rather than telling them their assumptions and foundational knowledge is incorrect. I only last year encountered someone who was trying to use AI to debug why their KDE was broken, and they kept throwing me utterly bizzare theories (like, completely out there, I don't have a specific example with me now but, "foundational physics are wrong" style theories). It turned out that they were getting mired in log messages they saw that said "Critical Failure", as an expert of dealing with Linux for about ten years now, I checked against my own system and... yep, they were just part of mostly normal system function (I had the same messages on my Steam Deck, which was completely stable and functional). The real fault was buried halfway through the logs. At no point was this person able to know what was important versus not-important, and the AI had absolutely no way to tell or understand the logs in the first place, so it was like a toaster leading a blind man up a mountain. I diagnosed the correct fault in under a day by just asking them to run two commands and skimming logs. That's experience, and that's irreplaceable by machine as of the current state of the world.
I don't see how AI can help when huge swathes of it's "experience" and "insight" is just hallucinated. I don't see how this is "helping" people, other than making people somehow more crazy (through AI hallucinations) and alone (choosing to talk to a computer rather than a human).
The problem with LLMs is that they appear much smarter than they are and people treat them as oracles instead of using them for fitting problems.
Books are a nice example of this, where we have both the table of contents for a general to particular concepts navigation, and the index for keyword based navigation.
> The majority of any 101 book will be enough to understand the jargon
A prompt is faster and free, whereas I'd have to order a book and wait 3+ days for it to arrive otherwise. Because while libraries exist they focus on books in my native language and not English.
Because if you know how to spot the bullshit, or better yet word prompts accurately enough that the answers don't give bullshit, it can be an immense time saver.
The idea that you can remove the bullshit by simply rephrasing also assumes that the person knows enough to know what is bullshit. This has not been true from what I've seen of people using AI. Besides, if you already know what is bullshit, you wouldn't be using it to learn the subject.
Talking to real experts will win out every single time, both in time cost, and in socialisation. This is one of the many reasons why networking is a skill that is important in business.
Take coding as an example, if you're a programmer you can spot the bullshit (i.e. made up libraries), and rephrasing can result in entire code being written, which can be an immense time saver.
Other disciplines can do the same in analogous ways.
You realize that all you have to do to deal with questions like "Marathon Crater" is ask another model, right? You might still get bullshit but it won't be the same bullshit.
In this particular answer model A may get it wrong and model B may get it right, but that can be reversed for another question.
What do you do at that point? Pay to use all of them and find what's common in the answers? That won't work if most of them are wrong, like for this example.
If you're going to have to fact check everything anyways...why bother using them in the first place?
"If you're going to have to put gas in the tank, change the oil, and deal with gloves and hearing protection, why bother using a chain saw in the first place?"
Tool use is something humans are good at, but it's rarely trivial to master, and not all humans are equally good at it. There's nothing new under that particular sun.
The situation with an LLM is completely different. There's no way to tell that it has a wrong answer - aside from looking for the answer elsewhere which defeats its purpose. It'd be like using a chainsaw all day and not knowing how much wood you cut, or if it just stopped working in the middle of the day.
And even if you KNOW it has a wrong answer (in which case, why are you using it?), there's no clear way to 'fix' it. You can jiggle the prompt around, but that's not consistent or reliable. It may work for that prompt, but that won't help you with any subsequent ones.
You have to be careful when working with powerful tools. These tools are powerful enough to wreck your career as quickly as a chain saw can send you to the ER, so... have fun and be careful.
But with LLMs, every word is a probability factor. Assuming the first paragraph is true has no impact on the rest.
It isn't obvious to me - that is rather plausible and a cute story.