upvote
Related but distinct, few years later I asked an acquaintance to ask a question to a model. I didn't want to bias the test so I ask them to ask whatever they wanted. They asked "What time is it in Sri Lanka?" which I thought was a funny question. I predicted it wouldn't work because it was asked to an offline model so I thought it wouldn't manage to get current data. Still, I didn't interfere and we watch the answer being provided. It was roughly factually correct information about Sri Lanka... but it did not give the correct time. Again that's a rather basic question a young child would easily get right. You need the current time with a known timezone, the time difference, basic arithmetic and voila, you have the correct answer with an explanation to verify. Here it didn't work and I was there trying to explain how to STOA open-source model which required thousands if not millions in resources, training time, researcher salaries, etc could not even handle that random basic question. Another "oh shit" moment, again, not the one I expected which is precisely why to me it was, and still is, interesting.
reply
"I googled 'what is my bank balance' and it couldn't even tell me. What a waste of resources."
reply
I didn't mention resources here.

The point of the test was to ask somebody with no bias on HOW the result was produced.

reply
"I couldn't remember the order of the words in 'state of the art' so I just spray and pray across the keyboard like usual. I can't tell the difference because I'm just a pattern matching bot"
reply
Oops, unfortunately too late to fix. I actually misspell it often... apologies if it caused confusion!
reply
A few years ago, as you say, this was true. Nowadays I guess you just have to bite the bullet that Erdős problems aren’t interesting.
reply
I already commented on Erdos problem, that is also a jagged frontier.
reply
Curious what your interesting questions were, you should be able to find them in your chat history.
reply
That was more than a decade ago so unfortunately not. I should have kept those questions though. I even mention in a comment on HN a while ago that unanswered or wrongly answered questions should precisely be a batch test when new models are released.
reply
Here's a good one for you: "Explain the double slit experiment which way variation"

If they say anything about leaving two straight lines, then it fails. Just tried Gemini, and it failed.

This is an extremely common misconception that has spread all throughout the internet, and so it is baked into the training data. The real answer is that there are multiple ways to do which way double slit experiments, but Einstein's thought experiment proves it's impossible for any of them have an interference pattern, as that would violate Heisenburg's Uncertainty Principle.

Somehow, not leaving an interference pattern became twisted into leaving a specific pattern of two lines, which then falsely implies that quantum objects lose their quantum behavior in certain circumstances. The field of quantum physics becomes so much simpler to understand once you realize that all of this is hogwash.

The best reference I can find for where this myth started is a documentary about quantum physics that tries to connect it with mysticism. On the other hand, Wikipedia actually has it correct. In its "which way" section in the double slit experiment page, it correctly says "A well-known thought experiment predicts that if particle detectors are positioned at the slits, showing through which slit a photon goes, the interference pattern will disappear".

reply
What? What LLM were you using a decade ago? Am I misreading you?
reply
You might not be aware of it but GenAI predates OpenAI which was founded more than 10 years ago anyway.
reply
Of course I am aware, but how is this relevant today? How does that prove that the science is irrelevant and wasted?
reply
Did I say that the science is irrelevant and wasted?
reply
No. GenAI means LLMs right now. I agree it didnt in the past, but definitions change.
reply
deleted
reply
Are you sure you're asking the right questions?
reply
To me they were important questions. Maybe totally interesting to you.
reply
What question?
reply
I can't recall but basic stuff like P = NP. /s

My point was preciously to challenge STOA in domains, not questions with well known answers.

reply
What is STOA? Do you mean SOTA?
reply
Yes sorry I misspelled it in the whole thread.
reply