I ran into quite a few out-of-memory iOS safari issues when I was building continuous voice recognition for my blind chess game, so people could play while on the go.
I originally tried to get away with just Whisper Tiny in the chess game [2], but it performs worse on the kinds of short phrases (knight E4, c takes d5, etc) used to dictate chess notation. Even with hotword-based phrasing and corrections, I found its accuracy on brief inputs noticeably poorer. So I switched over to Sherpa [3] trained on gigaspeech. It’s significantly more accurate, but it also comes with a correspondingly larger memory footprint.
Ideally, I would have used just one engine, but I needed a fallback for iOS devices (especially older ones) which can easily OOM.
[1] - https://github.com/snakers4/silero-vad
[1] https://github.com/onnx/onnx/blob/main/onnx/onnx.proto#L605