That's what fortunetellers do. The problem isn't guessing correctly about AI content in writing. The problem is false positives. That's what puts it in the same category is predictive policing scam software. And fortunetelling.
False positive and false negative rates are non-zero, as with almost anything, but the tools are pretty good. I encourage you to give them a try. Pangram is a good state-of-the-art choice and you can try it for free. They also publish evals and other data about their approach.
I think you're basing this off a fundamental misunderstanding of what these detectors look for. LLMs generate human-like text, but they also generate roughly the same style and content every time for a given prompt, modulo some small amount of nondeterminism. In essence, they are a very predictable human. Ask Gemini or ChatGPT ten times in a row to write an essay about why AI is awesome, and it will probably strike about the same tone every single time, with similar syntax, idioms, etc.
This is what these tools detect: the default output of "hey ChatGPT, write me a school essay about X". This can be evaded with clever prompting to assume a different writing personality, but there's only so much evasion you can do without making the text weird in other ways.
That's not like detecting thoughts via fMRI, it's like detecting tomorrows malware with yesterday's malware signatures. Or like researchers making a vaccine against the common cold
And the obvious proposal to fix that has been made multiple times in this thread: don't make take-at-home tasks part of the grade. Instead of trying to punish what you can't reliably detect, take away the incentive to do it in the first place
I don't understand your argument. The vendors for these detection tools can acquire recent samples from all frontier models just as easily as you can use them to write essays. There's nothing that requires a one-year delay.
Do AI vendors specifically train models to circumvent AI detectors? Why would they?
Different people. I for one have always claimed that fMRI is too coarse-grained for detailed thought detection.
If AI detection "sometimes fails", it doesn't "work". It works well enough to convict someone with other evidence, but when there's no other evidence nor an attempt to get any, it has no good use.
What I propose is simple: grade only closed-book exams, and hold students' phones during the exams. Students don't need 1:1 monitoring, it's the same as 10-20 years ago.