I was wondering about that statement. Shouldn't it restrict sampling to only tokens that produce valid JSON matching the schema during a tool call? On the other hand, I have heard a lot about how even production LLM providers don't always call tools accurately, so I suppose either it's hard to implement what I described or there's something I haven't thought of that makes it impossible.