Consider an exhaust condensation cloud coming from a vehicle's tail pipe -- it could be opaque to a camera/computer-vision system. Can you model your way out of that? Or is it also useful to do sensor fusion of vision data with radar data (cloud is transparent) and others like lidar, etc. A multi-modal sensor feed is going to simplify the model, which in the end translates into compute load.
Even if it’s an intelligence problem, it’s possible that machine intelligence will not get to the point where it can resolve anytime soon, whereas more sensors might circumvent the issue completely. It’s like with Musk’s big claim (that humans use camera only to drive); the question is not if a good enough brain will be able to drive vision-only, but if Tesla can make that brain.
I am skeptical that tesla has this solved but interested in seeing how it goes when as they move to expand their robotaxi service.
Sensors or intelligence, at the end of the day it’s an engineering problem which doesn’t require pure solutions. Sometimes sensors break and cameras get covered in mud.
The problem is maintaining an acceptable level of quality at the lowest possible price, and at some point you spend more money on clever algorithms and researchers than a lidar.