It's really amazing we can make machines do that, and it's really depressing that we think a stochastic bullshit machine is going to give us something we can rely on.
Ask a human a question like this, and they also have a chance of getting it wrong, even when confident.
Also, I asked a thinking model with browsing enabled and got this:
> The Google Pixel 10 is expected to support Wi-Fi 7 (802.11be), based on the Qualcomm Snapdragon 8 Gen 4 / Tensor G5 chipset it will likely use, which includes an integrated Wi-Fi 7 modem. Specific finalized specs aren't confirmed until Google's official announcement.
(Model GLM-5-Turbo - two months old - using Kilo Code in the "Ask" profile; in its thinking token churn it reasoned that it should keep the response brief and direct. Perhaps not the best suite of model+harness for this task, but it's what I had to hand that's not quantized to shit, is a thinking model, and has a web search tool available to it.)
Why would a human know specs for a random phone off the top of their head? The human response is either "I don't know" or "let me look that up", not a hallucination.
We google something specifically because the humans within reach don't know. The goal of searching is, well, to search pages - we're trying to find a site when we use google search.
The goal when using an LLM is generally different; we want an answer, not a site.
Claude is OK at saying when it can’t find good information, but it’s still 50/50 on citing a source that has nothing to do with its claim.
They can still be useful, e.g. they're significantly better at finding "I want a thing that does x but not y and it must be blue, or maybe two things that can be glued together to do that" than classic search. But they'll routinely miss extremely obvious answers because the related search it ran didn't find it, or completely screw up what something can actually do. Checking more pages of results by hand or asking humans who know even a little about those fields is still wildly more useful... but they're absolutely slaughtering the sites where people do that, by stealing all the real traffic and sending DDoS-level automated requests.
I built a retro game clone once and I used that project as a way to try out AI. While it wasn't perfect, it definitely wasn't wrong about everything. I'd go so far as to say it was probably correct (or damn close) 75% of the time.
I see people on HN all the time saying AI is terrible, but that just isn't the experience I'm having. I'm willing to admit it may have something to do with me not being able to recognize I'm being fed bullshit. Or, I may be asking really simple questions. Who knows? But AI seems like a pretty useful tool for average people.