The problem is that sign(x[n-1]) != sign(x[n]) describes a place where two successive samples differ in sign, but no sample is actually has a value of zero. Thus, to perform an edit there, if your goal is to avoid a click by truncating with a non-zero sample value, you need to add/assign a value of zero to a sample. This introduces distortion - you are artifically changing the shape of the waveform, which implies the introduction of all kinds of frequency artifacts.
Zero crossings are not computed by finding a minimum between two consecutive samples - that would almost never involve a sign change. And if they are computed by finding the minimum between two consecutive samples that also involves a sign change, there's a very good chance that you'll be long way from your desired cut point, even if you ignore the distortion issue.
It really was a completely misguided idea. If the situation was:
sign(x[n-2) != sign(x[n]) && x[n-1] == 0
then it would be great. But this essentially never happens in real audio.No, you (the editor, not an algorithm) look at the waveform and see where the amplitude begins to significantly oscillate and place the edit at a reasonable point, like where the signal is near the noise floor and at a point where it crosses zero. There's no zero stuffing.
This kind of thing isn't computed, a human being is looking at the waveform and listening back to choose where to drop the edit point. You don't always get it pop-free but it's much better than an arbitrary point as the sample is rising.
I mean, you could use an algorithm for this. It would be a pair of averaging filters with like a VAD, but with lookahead, picking an arbitrary point some position before activity is detected (peak - noise_floor > threshold)) which could be where avg(x[n-N..n]) ~= noise_floor && sign(x[n]) != (sign(x[n-1]).
I agree with this, but that doesn't invalidate anything I've said. When you or a bit of software decide to make the cut at x[n], you are faced with the near certainty that the x[n] != 0. If you set it (or x[n+1]) to zero, you add distortion; if you don't, the risk of a pop is significant.
By contrast, if you apply a fade, the risk of getting a pop is negligible and you can make the cut anywhere you want without paying attention to 1 sample-per-pixel or finer zoom level and the details of the waveform.