edit: nevermind, found info on the docs about how to enable post processing. Would still be interested in your prompt though if you don't mind sharing!
This is the prompt I use (it's probably overkill and can be condensed):
It's the model doing the work inside the wrapper that an app provides.
https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2
https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3
It's almost instant on my new M5 Max w/ 36GB of memory, but I used both with Handy on my previous 2019 Intel Mac w/ 16GB memory and was completely surprised at just how fast it was for being on-device! Not instant, but only a couple seconds.
Transcription this good used to cost A LOT, now it rounds down to free.