GoMentum Station and Metamoto announce a new digital twin simulator to accelerate the development and testing of ADAS features and automated driving systems
Authored by Paul Wells
Applying autonomy to safety critical applications — most notably, autonomous vehicles — requires extensive verification and validation. The majority of this testing takes place in scalable simulation tools whereas much of validation and verification is still accomplished in the physical world. As such, correlation between these test modes remains an important consideration when evaluating and advancing the overall efficiency of test efforts.
Metamoto and GoMentum Station are excited to partner to offer the GoMentum Bunker City scene within Metamoto’s massively scalable, cloud-based simulator to help promote this connectivity. Using the GoMentum scene, developers are able to both drive efficiency and go deeper into specific validation subdomains. Contact us for more information about the digital twin.
Efficiency
Physical testing is notoriously resource intensive. To help drive costs down, developers and test engineers are faced with two key options: perform as much virtual testing as possible, and identify the smallest, but still significant, set of physical tests required to produce meaningful assurance and test results. Simulation, when integrated with physical test environments, is an incredibly powerful ally in both of these efforts.
In addition to supporting the overall development and advancement of an autonomous stack, virtual testing within a digital twin environment effectively allows for a faster and more efficient bug discovery process before deploying hardware and drivers into a physical environment. This reduces the likelihood of time-consuming troubleshooting that pops up during the precious time spent either on a track or on public roads. GoMentum, for example, is an unusual environment. The use of the Metamoto scene and simulator allows for virtual exploration of the environment such that static objects in the environment, like bunkers and tall grasses, and dynamic agents, such as cows, turkeys and the like, do not disrupt time spent on the track. Turnaround times thereby become faster, and the cycle between virtual and physical testing continues efficiently.
Parity Testing
An added benefit of leveraging digital twins is the potential for parity into the conformance of virtual and physical test results. Due to the increasing importance of, and reliance on, simulation testing, understanding the relationship between virtual and physical test results is critical. This exploration of parity testing allows for greater awareness of the relationship between sample use cases and model fidelities leveraged in simulation. While the goal of this awareness is principally safety, it also has the potential to drive even greater use of simulation tools.
As part of the new GoMentum Bunker City Digital Twin, GoMentum and Metamoto invite industry and academic participants to partake in research projects covering feature development, parity testing, safety testing, and more.
https://gomentumstation.net/wp-content/uploads/2018/07/GoMentumStationLogo-01-1.png00AAA NCNUhttps://gomentumstation.net/wp-content/uploads/2018/07/GoMentumStationLogo-01-1.pngAAA NCNU2020-06-12 17:47:122020-07-01 23:24:30Accelerating ADAS and AV Development with the GoMentum Digital Twin
AAA Northern California, UC Berkeley and LG Silicon Valley Labs partner to examine the use of digital twin and parameterized testing to drive efficiency in closed-course testing. For a summary of the study download the AAA-UCB-LG AV Testing Project Whitepaper.
Authored by Paul Wells
Physical test execution at GoMentum Station. Pedestrian scenario along Pearl St. in “Downtown” zone.
UPDATE: On August 20, 2020 UC Berkeley hosted a webinar on the research outlined in this article and tools leveraged in the work. Visit the YouTube link for an in-depth overview.
Simulation and physical testing are widely known to be complementary modes of automated vehicle verification & validation (V&V). While simulation excels in scalable testing of software modules, physical testing provides the ability for full-vehicle, everything-in-the-loop testing and “real life” evaluations of sensing stacks or downstream perception / sensor fusion layers. These tradeoffs, in part, contribute to the test distribution approach put forth by Safety First For Automated Driving (SaFAD).
“Test Platform and Test Item”. Safety First for Automated Driving, Chapter 3: Verification & Validation (2019).
Due to the ability of simulation to scale rapidly, much work is currently underway to resolve its core limitation (fidelity). Developments in modeling (sensors, environment, vehicle dynamics, human drivers, etc.) and fast, photorealistic scene rendering all stand to expand the scope of validation exercises that can be performed virtually. Much less studied, however, are the means by which physical testing can improve upon its core limitation (efficiency). Whereas the speed of virtual testing allows for near constant re-submission and re-design of test plans, the physical world is much less forgiving. Being strategic about time spent on a track is therefore vital to maximize the efficiency of physical testing. Although many inputs into efficient physical testing are fixed (e.g. the inherent slowness in working with hardware), it is unclear whether the utility of physical test data also scales linearly alongside the volume of test execution. In other words, are the results of all physical tests equally valuable, or are there certain parameters within a given scenario which, if realized concretely at the track, would result in more valuable insights? If so, how might one discover these higher-value parameters using a virtual test environment?
These questions were central to our three-way research partnership between GoMentum Station (owned and operated by AAA Northern California), UC Berkeley VeHiCaL, and LG Silicon Valley Lab (LG SVL). Accordingly, we modeled a digital twin of GoMentum in the LG Simulator and leveraged an integration between Berkeley’s Scenic scenario description language (Scenic) and LG’s simulation platform. We elected a pedestrian crossing scenario and used the Scenic-LG integration to parameterize the scenario along several relevant vectors, executing nearly a thousand different virtual test cases. The results of one such test set were as follows, where rho is a proxy for the minimum distance between the ego vehicle and pedestrian target. Within this plot, we identified the clustering patterns along rho to be most interesting. As such, we elected eight cases for physical test execution: two failures (F), three successes (S), and three marginal (M) cases, where failure cases exhibited minimum distance values of less or equal to .4 meters. In short, the results from physical testing established safe/marginal/unsafe parity across test modes.
Each point represents a test case. X = pedestrian start delay (s), Y = pedestrian walk distance (m), Z = pedestrian hesitate time (s). Rho = proxy for minimum distance.
While the concept of correlating virtual and physical tests is not in itself novel, our results provide evidence to suggest that parametrization and analysis of virtual test results can be used to inform physical test plans. Specifically, the framework of recreating “low-rho” failure cases physically and within a reasonable degree of conformance to virtual runs allows for the capture of rich ground-truth data pertaining to Vehicle Under Test (VUT)-specific critical cases — all without the guesswork involved in manually tuning scenario parameters at the physical test site. Because we were able to tease out deficiencies of the AV stack only after running ten odd test cases, the utility of data captured relative to time spent onsite was significant. As compared to relatively unstructured physical test exercises or arbitrarily assigned scenario parameters, this framework represented an efficient means of both discovering and recreating physical critical cases.
Stepping outside this framework and the original research scope, our results also provide evidence of several challenges in validating highly automated vehicles (HAV). Even within our relatively low volume of tests, one test case (M2) produced a false negative when transitioning from virtual to physical testing — underscoring the importance of testing virtually and physically, as well as the difficulty in interpreting simulation results or using simulation results as a proxy for overall vehicle competence. We were also surprised by the exceptionally sensitive tolerances within each parameter. In certain cases the difference between a collision / no collision was a matter of milliseconds in pedestrian hesitation time, for instance. This underscores both the brittleness and, to a lesser extent, the non-determinism of machine learning algorithms — two of the broader challenges in developing AI systems for safety-critical applications.
Physical test execution at GoMentum Station. Pedestrian scenario along Pearl St. in “Urban Zone”.
Importantly, these challenges face industry and independent assessors alike. Current test protocols for agencies like EuroNCAP are very rigid. Vehicle speed, a primary test parameter in an AEB VRU test protocol for instance, varies along a step function with a range from ~20-60 kph and increments of 5kph. While perhaps suitable for L2 systems where drivers remain the primary line of defense, this approach clearly contradicts the parameter sensitivities exhibited above. If independent assessors hope to ascertain meaningful conclusions from the results of physical test exercises, these exercises will need to be highly contrived — i.e. not only will the scenarios need to be chosen according to a particular operational domain design (ODD), but the parameters used to concretize physical tests should in fact be assigned by inquiry — perhaps even VUT-specific inquiry — instead of by top-down, industry-wide mandate. This could necessitate an assessment framework built around coverage and safety case development rather than test-by-test scoring and comparison — an approach encouraged by work from groups like UL 4600 and Foretellix.
Many open questions remain in examining HAV test efficiency and overall HAV validation processes. We look forward to using the insights above in 2020 to delve deeper into research and to continue forging relationships with industry. We will be engaging in subsequent projects within the SAE IAMTS coalition to further explore both the toolchain and process used for correlating virtual and physical results.
This work also generated a number of outputs that we look forward to sharing with the industry. All videos from our track testing at GoMentum are available here, with recorded simulation runs here. The underlying datasets — both from ground truth at the track and outputs of the simulation — are also being used for further analysis and safety metric development.
https://gomentumstation.net/wp-content/uploads/2020/02/GoMentum_01.jpg9001200AAA NCNUhttps://gomentumstation.net/wp-content/uploads/2018/07/GoMentumStationLogo-01-1.pngAAA NCNU2020-03-26 12:52:522020-10-02 12:43:50Physical Test Efficiency via Virtual Results Analysis