I think some of the folks in the local llm social media communities are using them for things like company-hosted customer service chat bots, or purely english text writing stuff where Q4 will probably not cause a problem. For more discrete technical work I stick pretty much exclusively to Q8.
Smaller quants for the model start to fall off but more importantly, smaller KV-cache quants fall off much faster so avoid less than Q8 there.