undefined

points

by consumer45119 hours ago |

comments

by letitgo1234519 hours ago|

[-]

Long codex sessions lead to a lot of cached token hits, esp when you resume them after a few hours.

by consumer45117 hours ago|

parent|

[-]

I personally don't count cached hits as $used... Neither in my harnesses, nor in the LLM-enabled apps I create. A cached token cannot be counted 1:1 as to a non-cached token, that would be silly.

Wait... when some Claude 5x/20x users say they are getting "$2000 of tokens for $100," does the 2k value include cached tokens, counted at the same $/token either way?

We cannot be this dumb as a community, can we? I must be wrong/misunderstanding..

by SatvikBeri15 hours ago|

parent|

[-]

I'm a fairly moderate user, never hit any kind of usage limits, but I used 44 million cache create tokens and 1.5 billion cache read tokens, which ccusage estimates would have cost $990, and calculates the different categories separately.

by andai19 hours ago|

prev|

[-]

Vibe coded a simple game (10,000 tokens of source code) with two popular coding agents. (Once each, to compare.)

One spent 200,000 tokens, to produce 10,000.

The other spent 1.9 million.

It could have been a single LLM call (10k tokens). lmao

(I note that the latter was designed by a company whose main source of revenue is token spend...)

by crab_galaxy19 hours ago|

parent|

[-]

What about the other 998 million tokens?

by andai3 hours ago|

parent|

[-]

Ya got me there. Maybe he's running OpenClaw?

by stronglikedan17 hours ago|

parent|

prev|

[-]

lots and lots of simple games

by skeptic_ai18 hours ago|

prev|

[-]

Don’t forget context. Basically I have 2 billion input and 1 million output. Every prompt you do, sends back the whole thing again and again. Let’s say you have 500k context used, you send 10 messages is 5 million. 100 messages 50 million. Use 5 threats is 250 million.

by consumer45118 hours ago|

parent|

[-]

But how is it even possible (bad harness?), or wise, to send 500k or 1M tokens per call? Regarding cache, how are you not hitting the 1hr cache? Also, start new chats early and often!

I have been "agentic coding" since Sonnet 3.5 and after this paper came out, it became my bible:

https://github.com/adobe-research/NoLiMa

Last I checked, all models suck as you fill the context window. "Context engineering" is how you do this whole thing.

by azuanrb9 hours ago|

prev|

[-]

[dead]