upvote
> *2-bit quantization produces \name\ instead of "name" in JSON output, making tool calling unreliable.

I was wondering about that statement. Shouldn't it restrict sampling to only tokens that produce valid JSON matching the schema during a tool call? On the other hand, I have heard a lot about how even production LLM providers don't always call tools accurately, so I suppose either it's hard to implement what I described or there's something I haven't thought of that makes it impossible.

reply
[dead]
reply