upvote

  "447 / 6144 tokens"
  "Generated in 0.026s • 15,718 tok/s"
This is crazy fast. I always predicted this speed in ~2 years in the future, but it's here, now.
reply
The full answer pops in milliseconds, it's impressive and feels like a completely different technology just by foregoing the need to stream the output.
reply
We need that for this chinese 3B model that think 45s for hello world but also solves math.
reply
Because most models today generate slowish, they give the impression of someone typing on the other end. This is just <enter> -> wall of text. Wild
reply