Hacker News
new
past
comments
ask
show
jobs
points
by
WASDx
1 hours ago
|
comments
by
stymaar
1 hours ago
|
[-]
For token generation, yes: because current-gen LLMs are autoregressive you need to add the inter-node latency for every since token.
For prompt processing it would work though, and it could for diffusion LLMs as well.
reply