For JSON to text formatting it works well on a one-round basis. So I think you should realistically have an evaluation ready to go so you can use it on these models. I currently judge them myself but people often use a smart LLM as judge.
Today writing eval harness with Claude is 5 min job. Do it yourself so you can explore as quants on Gemma get better.