undefined

points

[-]

For many workflows involving real time human interaction, such as voice assistant, this is the most important metric. Very few tasks are as sensitive to quality, once a certain response quality threshold has been achieved, as is the software planning and writing tasks that most HN readers are likely familiar.

by raw_anon_111138 minutes ago|

parent|

[-]

The way that voice assistants work even in the age of LLMs is

Voice —> Speech to Text -> LLM to determine intent -> JSON -> API call -> response -> LLM -> text to speech.

TTFT is irrelevant, you have to process everything through the pipeline before you can generate a response. A fast model is more important than a good model

Source: I do this kind of stuff for call centers. Yes I know modern LLMs don’t go through the voice -> text -> LLM -> text -> voice anymore. But that only works when you don’t have to call external sources