> This is useful when you simply can’t hear someone very well or when the speaker makes a mistake
I have a few friends with pretty heavy accents and broken English. Even my partner makes frequent mistakes as a non native English speaker. It's made me much better at communicating but it's also more work and easier for miscommunication to happen. I think a lot of people don't realize this also happens with variation in culture. So even within people speaking the same language. It's just that the accent serves as a flag for "pay closer attention". I suspect this is a subtle but contributing problem to miscommunication on the and why fights are so frequent.Are you criticizing LLMs? Highlighting the importance of this training and why we're trained that way even as children? That it is an important part of what we call reasoning?
Or are you giving LLMs the benefit of the doubt, saying that even humans have these failure modes?[0]
Though my point is more that natural language is far more ambiguous than I think people give credit to. I'm personally always surprised that a bunch of programmers don't understand why programming languages were developed in the first place. The reason they're hard to use is explicitly due to their lack of ambiguity, at least compared to natural languages. And we can see clear trade offs with how high level a language is. Duck typing is both incredibly helpful while being a major nuisance. It's the same reason even a technical manager often has a hard time communicating instructions. Compression of ideas isn't very easy
[0] I've never fully understood that argument. Wouldn't we call a person stupid for giving a similar answer? How does the existence of stupid mean we can't call LLMs stupid? It's simultaneously anthropomorphising while being mechanistic.
I did not catch that in the first pass.
I read it as the casualties, who would be buried wherever the next of kin or the will says they should.
That's also something people seem to miss in the Turing Test thought experiment. I mean sure just deceiving someone is a thing, but the simplest chat bot can achieve that. The real interesting implications start to happen when there's genuinely no way to tell a chatbot apart.
The problem is that most LLM models answer it correctly (see the many other comments in this thread reporting this). OP cherry picked the few that answered it incorrectly, not mentioning any that got it right, implying that 100% of them got it wrong.
That seems problematic for a very basic question.
Yes, models can be harnessed with structures that run queries 100x and take the "best" answer, and we can claim that if the best answer gets it right, models therefore "can solve" the problem. But for practical end-user AI use, high error rates are a problem and greatly undermine confidence.
You can even see those in this very thread. Some commenters even believe that they add internal prompts for this specific question (as if people are not attempting to fish ChatGPT's internal prompts 24/7. As if there aren't open weight models that answer this correctly.)
You can't never win.
I know nothing about chemistry. My smartest move was to not provide the color and ask what the color might have been. It never guessed blue or purple.
In fact, it first asked me if this was highschool or graduate chemistry. That's not... and it makes me think I'll only get answers to problems that are easily graded, and therefore have only one unambiguous solution
But what's the question? Are you trying to fix it? Just determine what's rusting?
Although, now that I look closely at them, the butter knife got eaten away in spots and it's already pretty cheap, so I'll toss it.
The difference between someone who is really good with LLM's and someone who isn't is the same as someone who's really good with technical writing or working with other people.
Communication. Clear, concise communication.
And my parents said I would never use my English degree.
It would be interesting to actually ask a group a people this question. I'm pretty sure a lot of people would fail.
It feels like one of those puzzles which people often fail. E.g: 'Ten crows are sitting on a power line. You shoot one. How many crows are left to shoot?' People often think it's a subtraction problem and don't consider that animals flee after gunshots. (BTW, ChatGPT also answers 9.)