GoMentum Station and AAA Northern California, Nevada & Utah partner with University of Waterloo Center for Automotive Research (WatCAR) to validate AV safety performance and compare results across virtual and physical scenario tests.
Authored by Paul Wells
A key challenge in validating autonomous vehicles — and, more broadly, safety-critical applications of AI — is the brittleness of machine learning (ML) based perception algorithms  . This challenge is significant because errors in sensing & perception affect all downstream modules; if a system cannot properly detect its environment, it will have a diminished likelihood of completing its goal within this same environment. Although focused on full-vehicle testing via structured scenarios, our work with University of Waterloo highlighted this key issue.
Our research subjected Waterloo’s automated research vehicle, Autonomoose , to several predefined scenarios: roundabout with road debris, traffic jam assist, standard pedestrian crossing, intersection pedestrian crossing, and stop sign occlusion.
These scenarios were tested first in simulation using the University’s WiseSim platform and the GeoScenario SDL, then re-tested physically at a closed-course. Although intended as a broad exploration of the utility of controlled, physical testing for autonomous vehicle validation, this project nonetheless surfaced findings which — albeit specific to our given tech-stack — reinforce the otherwise well-documented challenges in validating autonomous systems.
A few highlights of our experience paper, specifically as related to test result discrepancies due to the perception system, can be summarized as follows:
Simulation architecture in this case provided “perfect perception”. As such, our virtual tests assumed that all actors in were perceived and understood by the system, in this case a modified instance of Baidu’s Apollo. This assumption led to large discrepancies between virtual and physical tests, especially in scenarios containing pedestrians. In our case, once the vehicle was introduced to the physical environment, perception-related deficiencies resulted in a large number of inconsistent or completely missed detection. During the pedestrian crossing, for instance, the AV struggled with intermittent loss of pedestrian perception due to issues with the object detection module. In simulation, however, the pedestrian was readily detected. Our video at the top of the fold shows performance on the physical track, while the below shows behavior in simulation.
Sensor fidelity in simulation was limited. Further highlighting the importance of closely tracked sensor model fidelity, the modeled LIDAR beam pattern used in simulation did not match the real specs of the on-vehicle sensor. This issue was uncovered due to conflicting virtual-physical test results within a road debris scenario, designed to assess the vehicle’s ability to detect small, stationary objects. As described in our paper, “The scenario exposed some lack of fidelity in the simulated lidar which caused inconsistent road debris detection behavior in simulation compared to the closed course. The lidar mounted on the roof of the SV had a non-linear vertical beam distribution, whereby the beams were grouped more densely around the horizon, and were too sparse to reliably detect the debris on the ground. In contrast, the simulated lidar had a linear vertical beam distribution, i.e., the beams were spaced out evenly. Consequently, implementing the non-linear beam distribution in the simulated lidar resulted in SV behavior in simulation consistent with the SV behavior on the closed course.” Pictured below, the Autonomoose fails to detect a cardboard box on the road during physical testing.
Environmental fidelity was limited. Finally, our use of WiseSim primarily involved a re-creation of the road network — but not the visual scene — present during closed-course testing. This introduced small complexities when Autonomoose was ultimately introduced into the physical track. Principally, uneven road slopes at the physical track created unexpected LiDAR returns and false detections onboard the vehicle. Because WiseSim did not recreate ground plane measurements, we encountered a bit of de-bugging at the track. This reiterates the need for close tracking of virtual model fidelity when using simulation to prepare for track tests and, more broadly, when performing correlations between virtual & physical tests.
Although these findings may not all generalize to the broader domain of AV testing, they nonetheless provide concrete instances of theoretical challenges facing the field. We will continue exploring these challenges with industry partners and sharing results. We also welcome inquiries about the scenarios used, technical assets created, and data sharing.
For the full paid SAE report visit their website to download.
For further info about the GoMentum testing team, please email us!