undefined

points

by monooso20 hours ago |

comments

by colwont12 hours ago|

[-]

This reduces token usage because it asks the model to think in AXON https://colwill.github.io/axon

by weird-eye-issue19 hours ago|

prev|

[-]

Yes but with prompt caching decreasing the cost of the input by 90% and with output tokens not being cached and costing more than what do you think that results in?

by wongarsu19 hours ago|

prev|

[-]

However output tokens are 5-10 times more expensive. So it ends up a lot more even on price

by weird-eye-issue18 hours ago|

parent|

[-]

Even more than that in practice once you factor in prompt caching

by verdverm16 hours ago|

prev|

[-]

My own output token ratio is 2% (50% savings on the expensive tokens, I include thinking in this, which is often more). I have similar tone and output formatting system prompt content.