undefined

points

[-]

Inference is run on shared hardware already, so they're not giving you the full bandwidth of the system by default. This most likely just allocates more resources to your request.

by 2 hours ago|

parent|

[-]

deleted

by hendersoon4 hours ago|

prev|

[-]

Could well be running on Google TPUs.