upvote
Can you share what you mean by this?

> when using dedicated AI resources that I'm paying for

Are there API-based search providers that structure their results differently?

reply
While I agree with what you’re saying the typical AI agent doesn’t say “I’m not totally sure about this, should I search the web?”. It often just spits out a reply based on its knowledge.
reply
That was true a year ago, I don't think it's true today. I can't remember the last time I saw Claude or ChatGPT confidently answer a question that they should have searched for instead.

If you watch their reasoning traces they often say things like "this is a well-known historical fact so I don't need to search for it", or more frequently they spit off a bunch of searches.

reply
Anecdotally, it still happens a ton to me. They also still make super simple logic errors that they immediately reverse when pressed. For example, I asked Opus 4.7 last night how to cool off my room without making it too humid inside (indoor temp 78°F, humidity 45%; outdoor temp 64°F, humidity 99%). It suggested opening a window and assured me that the humidity would not rise above around 60% which would still be comfortable. I asked it to justify that and it said:

>You're absolutely right about the humidity — I was sloppy with that aside. If you ventilate enough to meaningfully cool the room, you're replacing indoor air with outdoor air wholesale, and you'd converge on outdoor conditions: 64°F and near-100% RH. That's miserable. The 55-60% figure I tossed out was hand-wavy nonsense — it would only hold if you barely cracked the window and mixed a tiny fraction of outdoor air in. At any ventilation rate that actually cools, you're just moving outside air inside.

reply
Two of the five models used (Gemini+Search and Sonar Pro) have retrieval capabilities and used search when classifying the claims. The disagreement between them is still quite significant - 42%.
reply
Here are those disagreements:

https://lite.datasette.io/?csv=https%3A%2F%2Fstatic.simonwil...

One example:

Researchers estimate that the average person ingests about 5 grams of plastic per week, which is approximately the weight of a credit card.

Gemini retrieval: Misleading

Sonar pro: Mostly True

reply
Internally the statement is perfectly true: some researchers did estimate this, and the credit card is a fair proxy for a 5g mass.

Was the research flagrantly incorrect? Yes. But that does not affect the truth of the statement.

reply