Hacker News
new
past
comments
ask
show
jobs
points
by
maz1b
4 hours ago
|
comments
by
tcdent
4 hours ago
|
next
[-]
Inference is run on shared hardware already, so they're not giving you the full bandwidth of the system by default. This most likely just allocates more resources to your request.
reply
by
2 hours ago
|
parent
|
[-]
deleted
reply
by
hendersoon
4 hours ago
|
prev
|
[-]
Could well be running on Google TPUs.
reply