upvote
If you're setting this up yourself instead of using a lab's built-it speech functionality, you can run a small LLM in parallel, on a local model or small model like Haiku, that acts as a gate for either doing TTS on the response or not. Its only job is to decide if the transcription it receives is of someone being done talking or if that person is likely to still be mid-thought or mid-sentence.
reply
I know it's not the perfect solution for you, but I use a voice recorder and send the LLM the transcript. And my god is it working great.

Usually I just explain the things I want it to do. The longest was 30 minutes rambling of explaining the methods section of a paper in non chronological order. It worked unbelievable good for me.

reply