Just give me a option to have a slower response but better model…
Google’s Gemini flash live 3.1 is better, especially used via the API - it can do tool calling (including to other, even smarter LLMs if you set it up yourself), you can set the reasoning level (even high is still close enough to realtime) and it can ground answers in google search. I love bidirectional voice and right now it’s probably the best option. You can try it in AI studio
Another option is to use pipecat with their VAD and separate STT and TTS and any (fast) LLM of your choice - but it’s more plumbing and not a true speech to speech model
Do you know why this is a thing? Despite the app technically being Gemini, I find it quite crap, while the AI Studio thing with thinking is my favorite LLM. Very jarring tbh.
But personally I've settled on just speaking to the slower models over a custom tts app, I find it being instant was not actually that important, and in the silence I find myself marinating in the discussion more anyway