undefined

points

[-]

And also, even with the suite of sensors that humans have, their vision perception is frequently inadequate and leads to crashes. If vision was good enough, "SMIDSY" wouldn't be such an infamous acronym in vehicle injury cases.

by kelnos4 hours ago|

parent|

[-]

For those of us not aware of Australian cycling jargon, "SMIDSY" means "Sorry, Mate, I Didn't See You".

by anthonypasq3 hours ago|

parent|

prev|

[-]

the issue is clearly attention not vision when it comes to humans. if we could actually process 100% of the visual information in our field of view, then accidents would probably go down a shit load.

by gpm26 minutes ago|

parent|

[-]

Attention is perhaps the limiting factor, but being able to look in two directions at once would help, and would help greatly if we had more attention capacity. E.g. anytime you change lanes you have to alternate between looking behind, beside, and in front and that greatly reduces reaction time should something unexpected happen in the direction you aren't currently looking...

by kube-system3 hours ago|

parent|

prev|

[-]

Humans have both issues. There are many human failures which are distinctly a vision issue and not attention related, e.g. misestimation of depth/speed, obscured or obstructed vision, optical focus issues, insufficient contrast or exposure, etc.

by arijun2 hours ago|

parent|

[-]

But how many of those crashes not caused by inattention could have been avoided with less idiocy and more defensive driving? I mean, yes, we can’t see as well in fog, but that’s why you should slow down

by kube-system2 hours ago|

parent|

[-]

Again, I'm still not saying that humans don't make bad decisions. I'm saying that, unequivocally, they also get into accidents while paying attention and being careful, as a result of misinterpretation or failure of their senses. These accidents are also common, for example:

* someone parking carefully, misjudges depth perception, bumps an object

* person driving at night, their eyes failed to perceive a poorly lit feature of the road/markings/obstacles

* person driving and suddenly blinded by bright object (the sun, bright lights at night)

* person pulling out in traffic who misinterprets their depth perception and therefore misjudges the speed of approaching traffic

* people can only focus their eyes at one distance at a time, and it takes time to focus at a different distance. It is neither unsafe nor unexpected for humans to check their instruments while driving -- but it can take the human eye hundreds of milliseconds to focus under normal circumstances -- If you look down, focus, look back up, and focus, as quick as you can at highway speeds, you will have travelled quite a long distance.

These type of failures can happen not as a result of poor decision making, but of poor perception.

by saltcured6 hours ago|

prev|

[-]

In theory, a computer should be able to do the same. It could do sensor fusion with even more sense modalities than we have. It could have an array of cameras and potentially out-do our stereo vision, or perhaps even use some lightfield magic to (virtually) analyze the same scene with multiple optical paths.

However, there is also a lot of interaction between our perceptual system and cognition. Just for depth perception, we're doing a lot of temporal analysis. We track moving objects and infer distance from assumptions about scale and object permanence. We don't just repeatedly make depth maps from 2D imagery.

The brute-force approach is something like training visual language models (VLMs). E.g. you could train on lots of movies and be able to predict "what happens next" in the imaging world.

But, compared to LLMs, there is a bigger gap between the model and the application domain with VLMs. It may seem like LLMs are being applied to lots of domains, but most are just tiny variations on the same task of "writing what comes next", which is exactly what they were trained on. Unfortunately, driving is not "painting what comes next" in the same way as all these LLM writing hacks. There is still a big gap between that predictive layer, planning, and executing. Our giant corpus of movies does not really provide the ready-made training data to go after those bigger problems.

by dcrazy3 hours ago|

parent|

[-]

Putting your point another way, in order to replicate an average human driver’s competence you would need to make several strong advancements in the state of the art in computer vision _and_ digital optics.

by DesaiAshu2 hours ago|

prev|

[-]

In India (among others), honking is essential to reducing crashes

We often greatly underestimate / undervalue the role of our ears relative to vision. As my film director friend says, 80% of the impact in a movie is in the sound

by SOLAR_FIELDS1 hours ago|

parent|

[-]

The day a Waymo can functionally navigate the streets of Mumbai is when we really have achieved l5

by wagwang2 hours ago|

prev|

[-]

Most of what you said has nothing to do with lidar vs camera

by dzhiurgis2 hours ago|

prev|

[-]

20 meters away motion vision is more accurate than stereoscopic vision. What is lidar helping to solve here?

by dymk2 hours ago|

parent|

[-]

Waymo claims its system, which uses a combination of LIDAR & vision, resolves objects up to 500 meters away

https://waymo.com/blog/2024/08/meet-the-6th-generation-waymo...

This company claims their LIDAR works conservatively at 250m, and up to 750m depending on reflectivity

https://www.cepton.com/driving-lidar/reading-lidar-specs-par...