Give me a question which the LLM answers vastly differently on runs.
I keep hearing how it's dumb and wrong but no one ever shares the chat or prompt
How many days of the week contain the letter d?
The answer I get with ChatGPT, and Grok is 3 and 6 with Claude.
In Firefox I got 6. In Chrome I got 7. LLMs are not even self-consistent.
I have the screenshots if anyone cares.