Another option is to use pipecat with their VAD and separate STT and TTS and any (fast) LLM of your choice - but it’s more plumbing and not a true speech to speech model
Do you know why this is a thing? Despite the app technically being Gemini, I find it quite crap, while the AI Studio thing with thinking is my favorite LLM. Very jarring tbh.