undefined

Ok it prints some stuff at the end but does it actually count the output tokens? That part was already built in somehow? Is it just retrying until it has enough space to add the footer?

by verdverm7 hours ago|

parent|

prev|

[-]

No, the model doesn't have purview into this afaik

I'm not even sure what "pausing" means in this context and why it would help when there are insufficient tokens. They should just stop when you reach the limit, default or manually specified, but it's typically a cutoff.

You can see what happens by setting output token limit much lower

by otabdeveloper48 hours ago|

parent|

prev|

[-]

No.

by MallocVoidstar9 hours ago|

prev|

[-]

> Even when the model is explicitly instructed to pause due to insufficient tokens rather than generating an incomplete response

AI models can't do this. At least not with just an instruction, maybe if you're writing some kind of custom 'agentic' setup.

by maxloh8 hours ago|

parent|

[-]

Yeah, it does. It was possible with 2.5 Flash.

Here's a similar result with Qwen Qwen3.5-397B-A17B: https://chat.qwen.ai/s/530becb7-e16b-41ee-8621-af83994599ce?...