upvote
Rate limit essentially is a token limit
reply
It depends on how it's implemented. If it's a fixed window, then your absolute ceiling is tokens/windows in a month. If it's a function of other usage, like a timeshare, you're still paying for some price for a month and you get what you get without paying more per token. There's an intrinsic limit based on how many tokens the model can process on that gpu in a month anyway, even if it's only you.
reply
Time x capacity is also a limit. There's always a limit.
reply
Is there any way to buy into a pool of people with similar usage patterns? Maybe I'm overthinking it, but just wondering
reply
I think it'd be best to pool with people with different patterns, not the same patterns. Perhaps it would be best to pool with people in different timezones, and/or with different work/sleep schedules.

If everyone in a pool uses it during the ~same periods and sleeps during the ~same periods, then the node would oscillate between contention and idle -- every day. This seems largely avoidable.

(Or, darker: Maybe the contention/idle dichotomy is a feature, not a bug. After all, when one has control of $14k/month of hardware that is sitting idle reliably-enough for significant periods every day, then one becomes incentivized to devise a way to sell that idle time for other purposes.)

reply
This is basically why the big companies can sell subscriptions for cheaper than API costs. First priority can go to API users, lower priority subscription users get slotted in as space/SLO allows, and then sell the remaining idle GPU to batch users and spare training. Oh and geography shift as necessary for different nations working hours.
reply
To be fair this is the price you pay for sharing a GPU. Probably good for stuff that doesn't need to be done "now" but that you can just launch and run in the background. I bet some graphs that show when the gpu is most busy could be useful as well
reply