upvote
I startup 4 or so projects then go do other things for 4 hours. I don’t have enough energy to steer overnight, but I’m at least “semi afk” for daytime steering. So throughput is king for me, tokens per hour. Not latency or actual tokens per second.
reply
Running locally is even worse for this, because if you're running 4 jobs at once they just run at 1/4 speed. Not literally, you can make up some of the difference with batching, but you have limited resources instead of spreading your requests out on an API provider's nodes.
reply
Is interactive use for coding something that actually works today? With unsafe mode, even frontier hosted models are slow enough I end up just tabbing out to work on other tasks. It would need to be much faster if I am to sit and stare at it while it churns. Local models might be a lot slower but workflow-wise it doesn't change much for me.
reply
It's not a bottleneck if you care about the actual code.
reply
I would expect the overwhelming majority of output tokens would not be the actual code but used for analysis, reasoning, testing and iteration. If you only use the agent for autocomplete then yes, the calculation is probably different.
reply
yea, and understanding that too is important. the idea you dont need to read code or analysis seems to align with the depwndcy addiction being shoved in thw pipe.
reply