So in the same family, you can generally quantize all the way down to 2 bits before you want to drop down to the next smaller model size.
Between families, there will obviously be more variation. You really need to have evals specific to your use case if you want to compare them, as there can be quite different performance on different types of problems between model families, and because of optimizing for benchmakrs it's really helpful to have your own to really test it out.
...this can't be literally true or no one (including e.g. OpenAI) would use > 6 bits, right?