undefined

points

[-]

It's a good question. Costs will be lumpy. Inference servers will have a preferred batch size. Once you have a server you can scale number of users up to that batch size for relatively low cost. Then you need to add another server (or rack) for another large cost.

However I think it's fair to say the cost is roughly linear in the number of users other than that.

There may be some aspects which are not quite linear when you see multiple users submitting similar queries... But I don't think this would be significant.

by rat99882 hours ago|

prev|

[-]

N*Log(N) can be approximated to O(N) for most realistic usecases.

As for LLM, there is probably some cost constant added once it can fit on a single GPU, but should probably be almost linear.