I'm oversimplifying it here, but the macro process is taking some known attributes and mapping them to what you are observing. For example, if you can detect people, and you know the average height of a person, you can compute where your horizon is, and where you should (or shouldn't) expect to see people in the FOV. You can do this with cameras, lidar, etc. When you have multiple sensors you can do a lot more to have them all sample an object in their own ways and converge on agreement of where they are relative to each other and the object.
I see no reason that LiDAR couldn’t participate in a similar algorithm.
A bigger issue would be knowing the shape of the car to avoid clipping an obstacle.
At some point, with enough sensor suites, we might be able to generalize better and have effective lower(?)-shot training for self-calibration of sensor suites.
I think the real reason why Tesla is known to require 10-minute calibration drive is, they shipped APHW2 long before the software matured, so they needed means to do it after the cars were shipped "blank". Other manufacturers only ship finalized hardware and software, and so they don't need a scalable tool-free calibration method.
Anyways, my point is that, Tesla cars need calibrations like anything else. This is same for any multi sensor SLAM systems, whether it uses sets of color cameras or laser spinny thingy or laser flash cameras or laser flash color camera thingy or combinations thereof.