undefined

points

by jrandolf14 hours ago |

comments

by mogili114 hours ago|

[-]

Rate limit essentially is a token limit

by ibejoeb12 hours ago|

parent|

[-]

It depends on how it's implemented. If it's a fixed window, then your absolute ceiling is tokens/windows in a month. If it's a function of other usage, like a timeshare, you're still paying for some price for a month and you get what you get without paying more per token. There's an intrinsic limit based on how many tokens the model can process on that gpu in a month anyway, even if it's only you.

by delusional9 hours ago|

parent|

prev|

[-]

Time x capacity is also a limit. There's always a limit.

by freedomben14 hours ago|

prev|

[-]

Is there any way to buy into a pool of people with similar usage patterns? Maybe I'm overthinking it, but just wondering

by ssl-312 hours ago|

parent|

[-]

I think it'd be best to pool with people with different patterns, not the same patterns. Perhaps it would be best to pool with people in different timezones, and/or with different work/sleep schedules.

If everyone in a pool uses it during the ~same periods and sleeps during the ~same periods, then the node would oscillate between contention and idle -- every day. This seems largely avoidable.

(Or, darker: Maybe the contention/idle dichotomy is a feature, not a bug. After all, when one has control of $14k/month of hardware that is sitting idle reliably-enough for significant periods every day, then one becomes incentivized to devise a way to sell that idle time for other purposes.)

by vineyardmike8 hours ago|

parent|

[-]

This is basically why the big companies can sell subscriptions for cheaper than API costs. First priority can go to API users, lower priority subscription users get slotted in as space/SLO allows, and then sell the remaining idle GPU to batch users and spare training. Oh and geography shift as necessary for different nations working hours.

by petterroea13 hours ago|

prev|

[-]

To be fair this is the price you pay for sharing a GPU. Probably good for stuff that doesn't need to be done "now" but that you can just launch and run in the background. I bet some graphs that show when the gpu is most busy could be useful as well