I don't think there is anything wrong with the results of this test.
It would be more interesting if we compared them to human results.
If you have trouble distinguishing between human and LLM results, that's interesting.
Also, sentient is irrelevant to this test.
Only if you listen to charlatans.
IOW, that comment was a sarcastic poke from someone who already supports AI workloads at work and have some knowledge about how all this works. ;)
[0]: https://notes.bayindirh.io/notes/Lists/Discussions+about+Art...