For voice conversations the issue can be more latency than filling the context. Without knowing the site is hard to say, but if he had multiple pages worth of text (dunno, type of cars, procedures, some emotional story, etc.) and a "slower" model, it might be worth it to use RAG to preselect fast a small portion and use LLM to refine the answer.
reply