undefined

points

[-]

Same, I really like the ONNX format. I only wish that they weren't so frustratingly difficult to use on Apple iOS. Their browser engine, WebKit, has become annoyingly restrictive over the years in terms of the working memory cap.

I ran into quite a few out-of-memory iOS safari issues when I was building continuous voice recognition for my blind chess game, so people could play while on the go.

by bring-shrubbery5 days ago|

parent|

[-]

Interesting, what use cases are you using onnx for btw?

by vunderba5 days ago|

parent|

[-]

So I use a VAD onnx (Silero [1]) to automatically detect when someone is talking, and then it sends the audio into one of the voice recognition libraries.

I originally tried to get away with just Whisper Tiny in the chess game [2], but it performs worse on the kinds of short phrases (knight E4, c takes d5, etc) used to dictate chess notation. Even with hotword-based phrasing and corrections, I found its accuracy on brief inputs noticeably poorer. So I switched over to Sherpa [3] trained on gigaspeech. It’s significantly more accurate, but it also comes with a correspondingly larger memory footprint.

Ideally, I would have used just one engine, but I needed a fallback for iOS devices (especially older ones) which can easily OOM.

[1] - https://github.com/snakers4/silero-vad

[2] - https://shahkur.specr.net

[3] - https://github.com/k2-fsa/sherpa-onnx

by Tsarp5 days ago|

parent|

[-]

RNNoise has a VAD inbuilt that works much better than silero.

https://github.com/xiph/rnnoise

by ollin5 days ago|

prev|

[-]

Most ONNX files are fp32, but the ONNX format actually allows fp16, int8, etc. as well (see onnx.proto for the full list of dtypes [1] - they even have fp8/fp4 these days!). I ended up switching over to fp16 ONNX models for my own web-based inference project since the quality is ~identical and page loads get 2x faster.

[1] https://github.com/onnx/onnx/blob/main/onnx/onnx.proto#L605

by exabrial5 days ago|

parent|

[-]

Thanks for the pointer actually. I need to take a look at this version of the spec.

by bring-shrubbery5 days ago|

prev|

[-]

Yeah it's pretty cool what a 2gb NN can do from a single image