Probably just 95% of the users. You know, the non-techies.
It will not only answer confidently incorrect, but it will not web search in obvious scenarios where it should.
The words here, aren't meant to be a warning for people in this type of community falling victim to this type of thing, its more for the general public that doesn't grasp the tools they are using, the people that wont ever wander across this article.
This i think is a huge reason we really need to jump into LLM basics classes or something similar as soon as possible. People that others consider "smart" will talk about how great chatgpt or something is, then that person will go try it out because that person they respect must be right, they'll hop on the free model and get an absurdly inferior product and not grasp why. They'll ask something that requires a web search to augment info, not get that web search, and assume the confidently incorrect agent is correct.
The thesis is also I think not entirely about not having that modern info at query time, its more scattered. Someone asks what product they should use to mash potatoes, a tool is suggested. Everyone that asks then receives that same recommendation and instead of having a range of different styles of mashing potatoes, we end up all drifting closer towards one style, and the range of variance in how food is prepared is slowly getting lost.
(At present, Gemini's question-answering capability (which Google kind of makes its users use) seems extremely error-prone -- much worse than competing LLMs when asked the same question.)
I recently saw a video discussing a researcher who published a fake scientific article about a fictitious disease, with bogus author names, even a warning IN the article itself that stated "This is not a real disease, this article is not real" (paraphrasing) but still AI ended up picking up this article and serving information from it as if it was a real disease.
It even got cited in papers (which were later redacted of course), but the fact those papers got published in the first place is a serious issue.
Isn’t a lot of pretraining done by chopping sources up into short-context-window-sized pieces and then shoving them into the SGD process? The AI-in-training could be entirely incapable of correlating the beginning with the end of the article in its development of its supposed knowledge base.