undefined

points

[-]

I wish someone would also thoroughly measure prompt processing speeds across the major providers too. Output speeds are useful too, but more commonly measured.

by JLO641 hours ago|

parent|

[-]

In my use case for small models I typically only generate a max of 100 tokens per API call, with the prompt processing taking up the majority of the wait time from the user perspective. I found OAI's models to be quite poor at this and made the switch to Anthropic's API just for this.

I've found Haiku to be a pretty fast at PP, but would be willing to investigate using another provider if they offer faster speeds.

by asselinpaul30 minutes ago|

parent|

prev|

[-]

OpenRouter has this information

by rattray1 hours ago|

prev|

[-]

Wow. How fast is haiku?