undefined

upvote

points

by xnx8 hours ago |

upvote

by bonsai_spool8 hours ago|

[-]

I think I'm misunderstanding - they're converting video into their representation which was bootstrapped with LIDAR, video and other sensors. I feel you're alluding to Tesla, but Tesla could never have this outcome since they never had a LIDAR phase.

(edit - I'm referring to deployed Tesla vehicles, I don't know what their research fleet comprises, but other commenters explain that this fleet does collect LIDAR)

reply

upvote

by smallmancontrov8 hours ago|

[-]

They can and they do.

https://youtu.be/LFh9GAzHg1c?t=872

They've also built it into a full neural simulator.

https://youtu.be/LFh9GAzHg1c?t=1063

I think what we are seeing is that they both converged on the correct approach, one of them decided to talk about it, and it triggered disclosure all around since nobody wants to be seen as lagging.

reply

upvote

by tfehring7 hours ago|

[-]

I watched that video around both timestamps and didn't see or hear any mention of LIDAR, only of video.

reply

upvote

by smallmancontrov6 hours ago|

[-]

Exactly: they convert video into a world model representation suitable for 3D exploration and simulation without using LIDAR (except perhaps for scale calibration).

reply

upvote

by tfehring6 hours ago|

[-]

My mistake - I misinterpreted your comment, but after re-reading more carefully, it's clear that the video confirms exactly what you said.

reply

upvote

by IhateAI_36 hours ago|

[-]

tesla is not impressive, I would never put my child in one

reply

upvote

by yakz8 hours ago|

[-]

Tesla does collect LIDAR data (people have seen them doing it, it's just not on all of the cars) and they do generate depth maps from sensor data, but from the examples I've seen it is much lower resolution than these Waymo examples.

reply

upvote

by justapassenger8 hours ago|

[-]

Tesla does it to map the areas to come up with high def maps for areas where their cars try to operate.

reply

upvote

by vardump7 hours ago|

[-]

Tesla uses lidar to train their models to generate depth data out of camera input. I don’t think they have any high definition maps.

reply

upvote

by ActorNightly7 hours ago|

[-]

The purpose of lidar is to prove error correction when you need it most in terms of camera accuracy loss.

Humans do this, just in the sense of depth perception with both eyes.

reply

upvote

by robotresearcher5 hours ago|

[-]

Human depth perception uses stereo out to only about 2 or 3 meters, after which the distance between your eyes is not a useful baseline. Beyond 3m we use context clues and depth from motion when available.

reply

upvote

by aylons4 hours ago|

[-]

Thanks, saved some work.

And I'll add that it in practice it is not even that much unless you're doing some serious training, like a professional athlete. For most tasks, the accurate depth perception from this fades around the length of the arms.

reply

upvote

by cyanydeez4 hours ago|

[-]

ok, but a care is a few meters wide, isn't that enough for driving depth perception similar to humans

reply

upvote

by robotresearcher4 hours ago|

[-]

The depths you are trying to estimate are to the other cars, people, turnings, obstacles, etc. Could be 100m away or more on the highway.

reply

upvote

by cyanydeez2 hours ago|

[-]

ok, but the point trying to be made is based on human's depth perception, but a car's basic limitation is the width of the vehicle, so there's missing information if you're trying to figure out if a car can use cameras to do what human eyes/brains do.

reply

upvote

by dbt007 hours ago|

[-]

(Always worth noting, human depth perception is not just based on stereoscopic vision, but also with focal distance, which is why so many people get simulator sickness from stereoscopic 3d VR)

reply

upvote

by FrojoS1 hours ago|

[-]

In fact there are even more depth perception clues. Maybe the most obvious is size (retinal versus assumed real world size). Further examples include motion parallax, linear perspective, occlusion, shadows, and light gradients

Here is a study on how these effects rank when it’s comes to (hand) reaching tasks in VR: https://pubmed.ncbi.nlm.nih.gov/29293512/

reply

upvote

by wolrah6 hours ago|

[-]

> Always worth noting, human depth perception is not just based on stereoscopic vision, but also with focal distance

Also subtle head and eye movements, which is something a lot of people like to ignore when discussing camera-based autonomy. Your eyes are always moving around which changes the perspective and gives a much better view of depth as we observe parallax effects. If you need a better view in a given direction you can turn or move your head. Fixed cameras mounted to a car's windshield can't do either of those things, so you need many more of them at higher resolutions to even come close to the amount of data the human eye can gather.

reply

upvote

by CobrastanJorji5 hours ago|

[-]

I keep wondering about the focal depth problem. It feels potentially solvable, but I have no idea how. I keep wondering if it could be as simple as a Magic Eye Autostereogram sort of thing, but I don't think that's it.

There have been a few attempts at solving this, but I assume that for some optical reason actual lenses need to be adjusted and it can't just be a change in the image? Meta had "Varifocal HMDs" being shown off for a bit, which I think literally moved the screen back and forth. There were a couple of "Multifocal" attempts with multiple stacked displays, but that seemed crazy. Computer Generated Holography sounded very promising, but I don't know if a good one has ever been built. A startup called Creal claimed to be able to use "digital light fields", which basically project stuff right onto the retina, which sounds kinda hogwashy to me but maybe it works?

reply

upvote

by kevindamm6 hours ago|

[-]

Actually the reason people experience vection in VR is not focal depth but the dissonance between what their eyes are telling them and what their inner ear and tactile senses are telling them.

It's possible they get headaches from the focal length issues but that's different.

reply

upvote

by mikepurvis6 hours ago|

[-]

My understanding is that contextual clues are a big part of it too. We see a the pitcher wind up and throw a baseball as us more than we stereoscopically track its progress from the mound to the plate.

More subtly, a lot of depth information comes from how big we expect things to be, since everyday life is full of things we intuitively know the sizes of, frames of reference in the form of people, vehicles, furniture, etc . This is why the forced perspective of theme park castles is so effective— our brains want to see those upper windows as full sized, so we see the thing as 2-3x bigger than it actually is. And in the other direction, a lot of buildings in Las Vegas are further away than they look because hotels like the Bellagio have large black boxes on them that group a 2x2 block of the actual room windows.

reply

upvote

by pants27 hours ago|

[-]

Another way humans perceive depth is by moving our heads and perceiving parallax.

reply

upvote

by menaerus7 hours ago|

[-]

How expensive is their lidar system?

reply

upvote

by hangonhn7 hours ago|

[-]

Hesai has driven the cost into the $200 to 400 range now. That said I don't know what they cost for the ones needed for driving. Either way we've gone from thousands or tens of thousands into the hundreds dollar range now.

reply

upvote

by bragr6 hours ago|

[-]

Looking at prices, I think you are wrong and automotive Lidar is still in the 4 to 5 figure range. HESAI might ship Lidar units that cheap, but automotive grade still seems quite expensive: https://www.cratustech.com/shop/lidar/

reply

upvote

by tzs5 hours ago|

[-]

Those are single unit prices. The AT128 for instance, which is listed at $6250 there and widely used by several Chinese car companies was around $900 per unit in high volume and over time they lowered that to around $400.

The next generation of that, the ATX, is the one they have said would be half that cost. According to regulator filings in China BYD will be using this on entry level $10k cars.

Hesai got the price down for their new generation by several optimizations. They are using their own designs for lasers, receivers, and driver chips which reduced component counts and material costs. They have stepped up production to 1.5 million units a year giving them mass production efficiencies.

reply

upvote

by bragr5 hours ago|

[-]

That model only has a 120 degree field of view so you'd need 3-4 of them per car (plus others for blind spots, they sell units for that too). That puts the total system cost in the low thousands, not the 200 to 400 stated by GP. I'm not saying it hasn't gotten cheaper or won't keep getting cheaper, it just doesn't seem that cheap yet.

reply

upvote

by jellojello6 hours ago|

[-]

[dead]

reply

upvote

by jmux7 hours ago|

[-]

Waymo does their LiDAR in-house, so unfortunately we don’t know the specs or the cost

reply

upvote

by nerdsniper6 hours ago|

[-]

Otto and Uber and the CEO of https://pronto.ai do though (tongue-in-cheek)

> Then, in December 2016, Waymo received evidence suggesting that Otto and Uber were actually using Waymo’s trade secrets and patented LiDAR designs. On December 13, Waymo received an email from one of its LiDAR-component vendors. The email, which a Waymo employee was copied on, was titled OTTO FILES and its recipients included an email alias indicating that the thread was a discussion among members of the vendor’s “Uber” team. Attached to the email was a machine drawing of what purported to be an Otto circuit board (the “Replicated Board”) that bore a striking resemblance to – and shared several unique characteristics with – Waymo’s highly confidential current-generation LiDAR circuit board, the design of which had been downloaded by Mr. Levandowski before his resignation.

The presiding judge, Alsup, said, "this is the biggest trade secret crime I have ever seen. This was not small. This was massive in scale."

(Pronto connection: Levandowski got pardoned by Trump and is CEO of Pronto autonomous vehicles.)

https://arstechnica.com/tech-policy/2017/02/waymo-googles-se...

reply

upvote

by ra76 hours ago|

[-]

We know Waymo reduced their LiDAR price from $75,000 to ~$7500 back in 2017 when they started designing them in-house: https://arstechnica.com/cars/2017/01/googles-waymo-invests-i...

That was 2 generations of hardware ago (4th gen Chrysler Pacificas). They are about to introduce 6th gen hardware. It's a safe bet that it's much cheaper now, given how mass produced LiDARs cost ~$200.

reply

upvote

by eptcyka7 hours ago|

[-]

Less than the lives it saves.

reply

upvote

by xnx7 hours ago|

[-]

Cheaper every year.

reply

upvote

by hijnksforall6 hours ago|

[-]

Exactly.

Tesla told us their strategy was vertical integration and scale to drive down all input costs in manufacturing these vehicles...

...oh, except lidar, that's going to be expensive forever, for some reason?

reply

upvote

by SecretDreams7 hours ago|

[-]

> Humans do this, just in the sense of depth perception with both eyes.

Humans do this with vibes and instincts, not just depth perception. When I can't see the lines on the road because there's too much slow, I can still interpret where they would be based on my familiarity with the roads and my implicit knowledge of how roads work, e.g. We do similar things for heavy rain or fog, although, sometimes those situations truly necessitate pulling over or slowing down and turning on your 4s - lidar might genuinely given an advantage there.

reply

upvote

by pookeh7 hours ago|

[-]

That’s the purpose of the neural networks

reply

upvote

by array_key_first6 hours ago|

[-]

Yes and no - vibes and instincts isn't just thought, it's real senses. Humans have a lot of senses; dozens of them. Including balance, pain, sense of passage of time, and body orientation. Not all of these senses are represented in autonomous vehicles, and it's not really clear how the brain mashes together all these senses to make decisions.

reply

upvote

by 7 hours ago|

[-]

deleted

reply

upvote

by mycall7 hours ago|

[-]

That is still important for safety reasons in case someone uses a LiDAR jamming system to try to force you into an accident.

reply

upvote

by etrautmann7 hours ago|

[-]

It’s way easier to “jam” a camera with bright light than a lidar, which uses both narrow band optical filters and pulsed signals with filters to detect that temporal sequence. If I were an adversary, going after cameras is way way easier.

reply

upvote

by sroussey6 hours ago|

[-]

Oh yeah, point a q-beam at a Tesla at night, lol. Blindness!

reply

upvote

by Jyaif7 hours ago|

[-]

If somebody wants to hurt you while you are traveling in a car, there are simpler ways.

reply

upvote

by shihab8 hours ago|

[-]

I think there are two steps here: converting video to sensor data input, and using that sensor data to drive. Only the second step will be handled by cars on road, first one is purely for training.

reply

upvote

by sschueller5 hours ago|

[-]

Autonomous cars need to be significantly better than humans to be fully accepted especially when an accident does happen. Hence limiting yourself to only cameras is futile.

reply

upvote

by dooglius6 hours ago|

[-]

They may be trying to suggest that, that claim does not follow from the quoted statement.

reply

upvote

by uejfiweun8 hours ago|

[-]

I've always wondered... if Lidar + Cameras is always making the right decision, you should theoretically be able to take the output of the Lidar + Cameras model and use it as training data for a Camera only model.

reply

upvote

by olex8 hours ago|

[-]

That's exactly what Tesla is doing with their validation vehicles, the ones with Lidar towers on top. They establish the "ground truth" from Lidar and use that to train and/or test the vision model. Presumably more "test", since they've most often been seen in Robotaxi service expansion areas shortly before fleet deployment.

reply

upvote

by bob_theslob6468 hours ago|

[-]

Is that exactly true though? Can you give a reference for that?

reply

upvote

by olex7 hours ago|

[-]

I don't have a specific source, no. I think it was mentioned in one of their presentation a few years back, that they use various techniques for "ground truth" for vision training, among those was time series (depth change over time should be continuous etc) and iirc also "external" sources for depth data, like LiDAR. And their validation cars equipped with LiDAR towers are definitely being seen everywhere they are rolling out their Robotaxi services.

reply

upvote

by senordevnyc4 hours ago|

[-]

are definitely being seen everywhere they are rolling out their Robotaxi services

So...nowhere?

reply

upvote

by 7 hours ago|

[-]

deleted

reply

upvote

by __alexs8 hours ago|

[-]

> you should theoretically be able to take the output of the Lidar + Cameras model and use it as training data for a Camera only model.

Why should you be able to do that exactly? Human vision is frequently tricked by it's lack of depth data.

reply

upvote

by scarmig7 hours ago|

[-]

"Exactly" is impossible: there are multiple Lidar samples that would map to the same camera sample. But what training would do is build a model that could infer the most likely Lidar representation from a camera representation. There would still be cases where the most likely Lidar for a camera input isn't a useful/good representation of reality, e.g. a scene with very high dynamic range.

reply

upvote

by dbcurtis6 hours ago|

[-]

No, I don't think that will be successful. Consider a day where the temperature and humidity is just right to make tail pipe exhaust form dense fog clouds. That will be opaque or nearly so to a camera, transparent to a radar, and I would assume something in between to a lidar. Multi-modal sensor fusion is always going to be more reliable at classifying some kinds of challenging scene segments. It doesn't take long to imagine many other scenarios where fusing the returns of multiple sensors is going to greatly increase classification accuracy.

reply

upvote

by etrautmann7 hours ago|

[-]

Sure, but those models would never have online access to information only provided in lidar data…

reply

upvote

by tfehring6 hours ago|

[-]

No, but if you run a shadow or offline camera-only model in parallel with a camera + LIDAR model, you can (1) measure how much worse the camera-only model is so you can decide when (if ever) it's safe enough to stop installing LIDAR, and (2) look at the specific inputs for which the models diverge and focus on improving the camera-only model in those situations.

reply