A poly is only testing one thing: can you convince the polygrapher that you can lie successfully
Just as a sociopath can learn to control their physiological response to beat a polygraph, a deceptively aligned model learns to control its token distribution to beat safety benchmarks. In both cases, the detector is fundamentally flawed because it relies on external signals to judge internal states.