upvote
Forcing short responses will hurt reasoning and chain of thought. There are some potential benefits but forcing response length and when it answers things ironically increases odds of hallucinations if it prioritizes getting the answer out. If it needed more tokens to reason with and validate the response further. It is generally trained to use multiple lines to reason with. It uses english as its sole thinking and reasoning system.

For complex tasks this is not a useful prompt.

reply
> Answer is always line 1. Reasoning comes after, never before.

This doesn't stop it from reasoning before answering. This only affects the user-facing output, not the reasoning tokens. It has already reasoned by the time it shows the answer, and it just shows the answer above any explanation.

reply
The output is part of context. The model reason but also output tokens. Force it to respond in an unfamiliar format and the next token will veer more and more from the training distribution, rendering the model less smart/useful.
reply
It won't matter. By the time it's done reasoning, it has already decided what it wants to say.

Reasoning tokens are just regular output tokens the model generates before answering. The UI just doesn't show the reasoning. Conceptually, the output is something like:

  <reasoning>
    Lots of text here
  </reasoning>
  <answer>
    Part you see here. Usually much shorter.
  </answer>
reply
The reasoning part is not diferente from the part that goes in answer. It’s just that the model is trained to do some magical text generation with back and forth. But when it’s writing the answer part of it, each word is part of its context when generating the next. What that means is that the model does not compute then write, it generates text that guide the next generation in the general direction of the answer.

If you steer it in strange (for it, as in not seen before in training) text, you are now in out-of-distribution, very weak generalization capabilities territory.

reply
> The reasoning part is not diferente from the part that goes in answer.

Exactly. And this instruction isn't telling it to skip the reasoning. That part is unaffected. The instruction is only for the user-visible output.

By the time the reasoning models get to writing the output you see, they've already decided what they are going to say. The answer is based on whatever it decided while reasoning. It doesn't matter whether you tell it to put the answer first or the explanation first. It already knows both by the time it starts outputting either.

You're basically hoping that adding more CoT in the output after reasoning will improve the answer quality. It won't. It's already done way more CoT while reasoning, and its answer is already decided by then.

reply
>The “answer before reasoning” is a good evidence for it. It misses the most fundamental concept of tranaformers: the are autoregressive.

I don't think it's fair to assume the author doesn't understand how transformers work. Their intention with this instruction appears to aggressively reduce output token cost.

i.e. I read this instruction as a hack to emulate the Qwen model series's /nothink token instruction

If you're goal is quality outputs, then it is likely too extreme, but there are otherwise useful instructions in this repo to (quantifiably) reduce verbosity.

reply
If they want to reduce token cost, just use a smaller model instead of dumbing down a more expensive.
reply
Don't most providers already provide API control over the COT length? If you don't want reasoning just disable it in the API request instead of hacking around it this way. (Internally I think it just prefills an empty <thinking></thinking> block, but providers that expose this probably ensure that "no thinking" was included as part of training)
reply
To me it’s as simple as “who knows best how to harness the premier LLM – Anthropic, the lab that created it, or this random person?”

That’s why I’m only interested in first party tools over things like OpenCode right now.

reply