undefined

points

[-]

I startup 4 or so projects then go do other things for 4 hours. I don’t have enough energy to steer overnight, but I’m at least “semi afk” for daytime steering. So throughput is king for me, tokens per hour. Not latency or actual tokens per second.

by smallerize23 hours ago|

parent|

[-]

Running locally is even worse for this, because if you're running 4 jobs at once they just run at 1/4 speed. Not literally, you can make up some of the difference with batching, but you have limited resources instead of spreading your requests out on an API provider's nodes.

by sweetjuly22 hours ago|

prev|

[-]

Is interactive use for coding something that actually works today? With unsafe mode, even frontier hosted models are slow enough I end up just tabbing out to work on other tasks. It would need to be much faster if I am to sit and stare at it while it churns. Local models might be a lot slower but workflow-wise it doesn't change much for me.

by cyanydeez23 hours ago|

prev|

[-]

It's not a bottleneck if you care about the actual code.

by pqtyw23 hours ago|

parent|

[-]

I would expect the overwhelming majority of output tokens would not be the actual code but used for analysis, reasoning, testing and iteration. If you only use the agent for autocomplete then yes, the calculation is probably different.

by cyanydeez23 hours ago|

parent|

[-]

yea, and understanding that too is important. the idea you dont need to read code or analysis seems to align with the depwndcy addiction being shoved in thw pipe.