upvote
Wispr Flow is a masterclass in STT. Apple's solution feels like it's from the last century in comparison. Same applies with Apple's TTS when you have ElevenLabs and OpenAI running laps around it. All I need is for my iPhone to do those things natively at the same quality level (because in Apple's walled garden that's the only way to get them usable everywhere).
reply
But Apple's uses so few system resources and runs fully on device on newer iPhone models (16+ I believe). It's so efficient. I really enjoy using Handy with Parakeet as the model, but the system resource usage is a monster compared to Apple's (although very good).

Looks like Wispr Flow uses a cloud model [0]:

> Cloud based speech processing infrastructure for 1B users

It gets to be a messy comparison because my iPhone can do STT with no latency pretty well fully on device, but Wispr Flow requires a cloud model, but to be fair, older Apple devices do as well. It's not an apples and oranges comparison, but I think those technical details make this a non direct comparison in a few ways.

For on-device with low system resource usage, Apple's is pretty damn good.

[0] https://wisprflow.ai/post/technical-challenges

reply
Apple's stt has been on-device for a long time now, long before iPhone 16. I haven't noticed any improvements since my first ever iphone 5S. I'm pretty sure wispr flow can use on-device models. I use Voiceink[0] which can use parakeet models on-device and can optionally use cloud models.It's like night and day comparing Apple's to Voiceink. The only advantage I find to Apple's stt is less friction. 3rd party apps just can't integrate as smoothly with the system. There's a gesture to activate Appledictation when Voiceover is on.
reply
It's been around and available as an API to devs since at least 2021 in iOS. The problem was even on the best iPhone at that time, I could never get it past ~0.8x speed and after 15-20 minutes the device would heat up so much the display dimmed.

For context, I was working on a podcast app with on-device transcription, had to park that idea for years before it got to today's performance.

reply
Apple runs on-device on older models, too, just wimpier models.
reply
human resources (my voice and time) are far more valuable than the system resources. going to the cloud is absolutely worth it to prevent a typo
reply
That doesn't work if you have limited or no connectivity (e.g. on a mountain range). There are also privacy concerns, e.g. a doctor using it to transcribe medical information.
reply
FWIW - I also really like Wispr Flow, but I moved to running the 'Whisper Large' model locally using Handy (https://github.com/cjpais/Handy), which has been essentially as good, while also having lower latency.
reply
Handy is great. It exposes a bunch of open models beyond Whipser too, and though I haven’t tried too many of them, I’ll throw in a rec for the Parakeet model which feels pretty much on par with Whisper for accuracy and is way way faster.
reply
I’d say STT is pretty much a solved problem. Everyday there is a new product and can be one-shotted by any current top of the line LLMs. Take a look at this [1]. Apple is just stuck in the past.

https://github.com/primaprashant/awesome-voice-typing

reply
Until siri can reliably handle "Navigate to <business that is a decade old>", offline and using pre-downloaded maps, I'm going to assume all the other, harder speech to text and conversational stuff is just vaporware.

I found another dreadful iPhone input "feature" yesterday. If you are browsing around in third party carplay apps, and ready to tap your selection, but instead press the accelerator first, it truncates the list to only a few items, and scrolls to the top.

Way to reduce driving distractions guys! What's next? If the car is moving, maps changes destinations?

I really wish human computer interaction research were more broadly applied, and if you do dumb stuff like all of the automotive / carplay world, then you'd be liable in court.

I once had a car that hid the backup cam behind a legal disclaimer every time you turned it on. I'm sure at least one pedestrian was hit by a car in reverse while that screen was on. The manufacturer should be 100% liable for the poor UI decision.

reply
I think their intent is actually safety. They employ two touch interaction models: Flexible while not moving and simplified while driving. For instance, keyboard input becomes unavailable while moving and you must rely on Siri. I personally find it irritating, particularly when I am a passenger, but I get it.
reply
It just means people pick up their phone instead.
reply
> Until siri can reliably handle "Navigate to <business that is a decade old>", offline and using pre-downloaded maps

Yeah, that's unfortunate considering you can have it do nearly all of that (download maps, navigate to business all while offline), except asking siri to do it for you.

> I once had a car that hid the backup cam behind a legal disclaimer every time you turned it on.

My car pops up a dialog telling me (in a paragraph+) to pay attention while in semi-autopilot which I have to click "ok" on to get back to the map. It's very ironic, and extremely dangerous.

reply
Yeah they could just take the open whisper model which really is great especially if you use the larger parameter versions. I love it.
reply
I don't think things have improved much on that front since Colin Hughes gave a run down on Voice Control's problems several years ago

https://www.theregister.com/on-prem/2023/08/16/those-who-rel...

Would be great if they could at least fix two major bugs:

* input simply fails (seemingly) randomly where it is supported and many apps from major vendors don't support dictation input at all (e.g. OneNote) (there should at least be a fallback (a la Dragon Dictate from decades ago) for those cases * capitalization is still random leaving you with many errors to correct

but Apple mostly seems to see accessibility as something to use to enable performative press releases not actual functionality...

reply
I think the random capitalization problem has gotten much better in iOS 26, or one of its minor releases (I recall in the 26.0 beta it was still there). I would have sworn before they would never fix it…

The streaming dictation they also added in that release is also much appreciated although occasionally buggy.

reply
All day every day my iPhone makes me feel like an idiot. I need to correct every other word I type (or at least what my iPhone thinks I typed). While correcting, autocorrect introduces new and even more baffling misspellings.

Sometimes it gets to “fever dream where you’re suddenly unable to successfully perform everyday tasks” levels of insanity.

And the worst part is: it used to be fine. I’d type more or less on full keyboard levels of speed and accuracy on my iPhone 4S.

reply
One thing that helped me a lot to fix the iPhone keyboard was turning off slide to type. I learned that tip here on HN actually!

Open your Settings app. Tap on General. Scroll down and select Keyboard. Toggle off Slide to Type

reply
It’d be amazing if speech-to-text could take into account context as well: Greek if I’m speaking Greek, Korean if I’m speaking Korean, or for (int i = 0; i < count; ++i) if I’m dictating code.
reply
Apple dictation on MacOS is actually pretty dang good. I've got it bound to a double-tap on fn and I use it pretty regularly.
reply
try wisprflow and then tell us it's good
reply
Wisprflow is not $12/month better than ios.

I’d much rather have “cheap, dependable, and good enough” over oligarch pricing for what used to be a one time software purchase any day.

reply
I just installed this and already despise its pricing model. I trust this product approximately zero.
reply
Open-source STT apps are plenty and just as good. Pick one from this list:

https://github.com/primaprashant/awesome-voice-typing

reply
there are plenty of free alternatives using the same models
reply
speaking of touch though they musn't have touched the swipe-typing feature in a while because somehow it works even better than the keyboard for me most of the time! No nonsense words like "oul" instead of "oil" constantly.
reply
I use Aqua Voice because Apple STT is so frustrating.
reply
I turned off my iphone’s autocorrect because it made too many stealth errors. Now I notice all my mistypes.

I have a friend named Zi in my contacts. For some reason ios kept autocorrecting “I” to “Zi” and would do it too far back for me to notice.

What’s weird is how this is such a dumb bug that Apple usually irons out.

reply
I want to echo the comments that you just made.

One of my primary methods of interacting with an iPhone is through speech and the state of Apple speech transcription is pretty horrible. It bothers me greatly.

I know some of the workarounds and things but it does feel like it’s in the Stone ages.

I don’t think it’s a microphone issue since iPhone microphones are fairly decent and I don’t think it’s a CPU issue either because Apple Silicon seems to be some of the best on the market. Which leaves us with the software…

Maybe they should put that cash hoard to good use and buy up some of these transcription companies or license their IP so we get truly high-quality transcription.

reply
There’s so much complaining about their keyboard issues, and it’s really an infuriating part of the iOS experience. The phone being hard to grip/slippery doesn't help, no…
reply