undefined

points

[-]

Curious on what backs this assertion. As a counterpoint we’ve been running 200+ models in production for more than 5 years - language models, embedding, classifiers, low tens to hundred M params. Traffic in the order of 1-2M requests/day and everything is enabled by onnx with some cgo (or Rust) plumbing on top. What’s your SLA?

by nnevatie4 hours ago|

parent|

[-]

Ahh, I should have probably added some context around my hyperbole. I was referring to real-time computer vision - think of e.g. segmenting FHD/UHD video.

by snovv_crash11 hours ago|

prev|

[-]

Strong statement to make when I have at least 2 datapoints contradicting it, in SaaS and embedded/robotics.

by dTal27 minutes ago|

prev|

[-]

OpenTrack uses it for its AI headtracking, which works extremely well.

by pzo8 hours ago|

prev|

[-]

how are supposed to use TensorRT on iOS, iPadOS, Android or even Web? Production is not only cloud.

by OvervCW10 hours ago|

prev|

[-]

You can use ONNXRuntime with a TensorRT backend, so one does not exclude the other.

by gunalx11 hours ago|

prev|

[-]

Production dosent have to be performance sensitive, so devex may still outcompete the performance differences in some scenarios.

by antonvs7 hours ago|

prev|

[-]

We use this in production:

https://docs.rs/onnxruntime/latest/onnxruntime/

It’s a Rust wrapper around ONNX Runtime. We currently serve 5+ million inference requests per day for a highly performance-sensitive application, for a long list of major enterprise clients. We don’t use GPUs for inference, because it would be cost-prohibitive. We launch tens of thousands of VMs per day to run these workloads.

by monster_truck7 hours ago|

prev|

[-]

I've never understood how anyone comes into contact with it and thinks its anything more than an incredible inconvenience masked as the easy way of doing things. Given it a few good shakes for various uses and regretted the time spent each time

by cik6 hours ago|

prev|

[-]

Ummm embedded robotics is all about this. For years.