It helps to understand that, because then you can also not be annoyed by things like "Let's do X. No, wait, X has this problem, let's do Y instead." You might think to yourself, if X was a bad idea, couldn't it have considered X and rejected it without outputting a token?" and the answer is, that sentence was it considering X and rejecting it, and no, there is no way for it to do that and not emit tokens. Thinking is inextricably tied to output for LLMs.
There is even some fairly substantial evidence from a couple of different angles that the thinking output is only somewhat loosely correlated to what the model is "actually" doing.
Token efficiency is an interesting question to ponder and it is something to worry about that the providers have incentives to be flabby with their tokens when you're paying per token, but the question is certainly not as easy as just trying to get the models to be "more succinct" in general.
I often discuss a "next gen" AI architecture after LLMs and I anticipate one of the differences it will have is the ability to think without also having to output anything. LLMs are really nifty but they store too much of their "state" in their own output. As a human being, while I find like many other people that if I'm doing deep thinking on a topic it helps to write stuff down, it certainly isn't necessary for me to continuously output things in order to think about things, and if anything I'm on the "absent minded"/"scatterbrained" side... if I'm storing a lot of my state in my output for the past couple of hours then it sure isn't terribly accessible to my conscious mind when I do things like open the pantry door only to totally forget the reason I had for opening it between having that reason and walking to the pantry.