upvote
The verbosity is likely a result of the system prompt for the LLM telling it to be explanatory in its replies. If the system prompt was set to have the model output shortest final answers, you would likely get the result your way. But then for other questions you would lose benefitting from a deeper explanation. It's a design tradeoff, I believe.
reply
My system prompt is default - "you are a helpful assistant". But that beyound the point though. You don't want too concise outputs as it would degrade the result, unless you are using a reasoning model.

I recommend rereading my top level comment.

reply
Well, when I asked for a very long answer (prompt #2), the quality had dramatically improved. So yes, longer answer produces better result. At least with small LLMs I can run on my GPU locally.
reply