upvote
Curious on what backs this assertion. As a counterpoint we’ve been running 200+ models in production for more than 5 years - language models, embedding, classifiers, low tens to hundred M params. Traffic in the order of 1-2M requests/day and everything is enabled by onnx with some cgo (or Rust) plumbing on top. What’s your SLA?
reply
Ahh, I should have probably added some context around my hyperbole. I was referring to real-time computer vision - think of e.g. segmenting FHD/UHD video.
reply
Strong statement to make when I have at least 2 datapoints contradicting it, in SaaS and embedded/robotics.
reply
OpenTrack uses it for its AI headtracking, which works extremely well.
reply
how are supposed to use TensorRT on iOS, iPadOS, Android or even Web? Production is not only cloud.
reply
You can use ONNXRuntime with a TensorRT backend, so one does not exclude the other.
reply
Production dosent have to be performance sensitive, so devex may still outcompete the performance differences in some scenarios.
reply
We use this in production:

https://docs.rs/onnxruntime/latest/onnxruntime/

It’s a Rust wrapper around ONNX Runtime. We currently serve 5+ million inference requests per day for a highly performance-sensitive application, for a long list of major enterprise clients. We don’t use GPUs for inference, because it would be cost-prohibitive. We launch tens of thousands of VMs per day to run these workloads.

reply
I've never understood how anyone comes into contact with it and thinks its anything more than an incredible inconvenience masked as the easy way of doing things. Given it a few good shakes for various uses and regretted the time spent each time
reply
Ummm embedded robotics is all about this. For years.
reply