Show HN: Open-source customizable AI voice dictation built on Pipecat

(github.com)

27 points

by kstonekuan55 days ago |

16 comments

by james_marks53 days ago|

[-]

Looks interesting!

How are you feeling about Tauri as you take this to a larger audience?

I’ve dabbled with it and thought it was compelling, but not supported a release.

by kstonekuan53 days ago|

parent|

[-]

I started off trying Electron, but I am liking Tauri more. It seems Rust has better support for system-level integration, like controlling audio and keys. Are there other alternatives you are exploring?

by james_marks53 days ago|

parent|

[-]

Yeah, Tauri struck me as much more powerful and lighter than Electron.

My hobby project with Tauri died when I managed to set an OS-wide shortcut, which is amazing.

Except it broke many other apps, and I just never got back to it.

by kstonekuan52 days ago|

parent|

[-]

Tauri v2 has this global shortcut plugin which I am using and it works amazing

https://v2.tauri.app/plugin/global-shortcut/

by Darell2751 days ago|

prev|

[-]

The level of customizability is quite promising. Is your vision for the app to be all in one, or perhaps one day serve an online customizable backend as a platform as a service?

by popalchemist55 days ago|

prev|

[-]

The critiques about local inference are valid, if you're billing this as an open source alternative to existing cloud based solutions.

by kstonekuan55 days ago|

parent|

[-]

Thanks for the feedback, probably should have been clearer in my original post and in the README as well. Local inference is already supported via Pipecat, you can use ollama or any custom OpenAI endpoint. Local STT is also supported via whisper, which pipecat will download and manage for you.

by popalchemist54 days ago|

parent|

[-]

Rad. put that front and center on the readme.

by kstonekuan54 days ago|

parent|

[-]

Updated!

by lrvick55 days ago|

prev|

[-]

Is there a way to do this with a local LLM, without any internet access needed?

by kstonekuan55 days ago|

parent|

[-]

Yes, Pipecat already supports that natively, so this can be done easily with ollama. I have also built that into the environment variables with `OLLAMA_BASE_URL`.

About ollama in pipecat: https://docs.pipecat.ai/server/services/llm/ollama

Also, check out any provider they support, and it can be easily onboarded in a few lines of code.

by bryanwhl54 days ago|

prev|

[-]

Does this work on macos?

by kstonekuan54 days ago|

parent|

[-]

Yup, the desktop app is built with Tauri, which is cross-platform compatible, and I have personally tested it on macos and windows

by grayhatter55 days ago|

prev|

[-]

I don't think I'd call anything that only works with a proprietary Internet hosted LLM (one you need an account to use) open-source.

This is less voice dictation software, and much more a shim to [popular LLM provider]

by mgsgde46 days ago|

parent|

[-]

I prefer Gemini 2.0 Flash because it is both faster and more capable than local large language models. In this context, I am not concerned about Google having my data; my primary objective is ensuring my government does not gain access to it ;)

by kstonekuan55 days ago|

parent|

prev|

[-]

Hey, sorry if the examples given were not robust, but because this is built on Pipecat, you can actually very easily swap to a local LLM if you prefer that, and the project is already set up to allow you to do that via environment variables.

The integration to set up the WebRTC connection, get the voice dictation working seamlessly from anywhere, and input into any app took a long time to build out, and that's why I want to share this open source.

by 46 days ago|

parent|

[-]

deleted