undefined

points

[-]

But Apple's uses so few system resources and runs fully on device on newer iPhone models (16+ I believe). It's so efficient. I really enjoy using Handy with Parakeet as the model, but the system resource usage is a monster compared to Apple's (although very good).

Looks like Wispr Flow uses a cloud model [0]:

> Cloud based speech processing infrastructure for 1B users

It gets to be a messy comparison because my iPhone can do STT with no latency pretty well fully on device, but Wispr Flow requires a cloud model, but to be fair, older Apple devices do as well. It's not an apples and oranges comparison, but I think those technical details make this a non direct comparison in a few ways.

For on-device with low system resource usage, Apple's is pretty damn good.

[0] https://wisprflow.ai/post/technical-challenges

by RobMurray12 hours ago|

parent|

[-]

Apple's stt has been on-device for a long time now, long before iPhone 16. I haven't noticed any improvements since my first ever iphone 5S. I'm pretty sure wispr flow can use on-device models. I use Voiceink[0] which can use parakeet models on-device and can optionally use cloud models.It's like night and day comparing Apple's to Voiceink. The only advantage I find to Apple's stt is less friction. 3rd party apps just can't integrate as smoothly with the system. There's a gesture to activate Appledictation when Voiceover is on.

by georgel10 hours ago|

parent|

[-]

It's been around and available as an API to devs since at least 2021 in iOS. The problem was even on the best iPhone at that time, I could never get it past ~0.8x speed and after 15-20 minutes the device would heat up so much the display dimmed.

For context, I was working on a podcast app with on-device transcription, had to park that idea for years before it got to today's performance.

by arijun13 hours ago|

parent|

prev|

[-]

Apple runs on-device on older models, too, just wimpier models.

by Invictus012 hours ago|

parent|

prev|

[-]

human resources (my voice and time) are far more valuable than the system resources. going to the cloud is absolutely worth it to prevent a typo

by rhdunn12 hours ago|

parent|

[-]

That doesn't work if you have limited or no connectivity (e.g. on a mountain range). There are also privacy concerns, e.g. a doctor using it to transcribe medical information.

by adamcharnock14 hours ago|

prev|

[-]

FWIW - I also really like Wispr Flow, but I moved to running the 'Whisper Large' model locally using Handy (https://github.com/cjpais/Handy), which has been essentially as good, while also having lower latency.

by dceddia12 hours ago|

parent|

[-]

Handy is great. It exposes a bunch of open models beyond Whipser too, and though I haven’t tried too many of them, I’ll throw in a rec for the Parakeet model which feels pretty much on par with Whisper for accuracy and is way way faster.

by primaprashant10 hours ago|

prev|

[-]

I’d say STT is pretty much a solved problem. Everyday there is a new product and can be one-shotted by any current top of the line LLMs. Take a look at this [1]. Apple is just stuck in the past.

https://github.com/primaprashant/awesome-voice-typing