upvote
This reduces token usage because it asks the model to think in AXON https://colwill.github.io/axon
reply
Yes but with prompt caching decreasing the cost of the input by 90% and with output tokens not being cached and costing more than what do you think that results in?
reply
However output tokens are 5-10 times more expensive. So it ends up a lot more even on price
reply
Even more than that in practice once you factor in prompt caching
reply
My own output token ratio is 2% (50% savings on the expensive tokens, I include thinking in this, which is often more). I have similar tone and output formatting system prompt content.
reply