upvote
Inference from an LLM is O(tokens^2)
reply
Only in the naive implementations of attention
reply