upvote
Output is what the compute is used for above all else; costs more hardware time basically than prompt processing (input) which is a lot faster
reply
input tokens are processed at 10-50 times the speed of output tokens since you can process then in batches and not one at a time like output tokens
reply