Tag Archive for: ADS Testing

GoMentum Presents at SAE World Congress WCX 2020 Digital Summit

GoMentum Station presents two key papers at SAE WCX 2020, on testing automated driving systems, and on a novel collision avoidance safety metric.

We are excited to announce that research performed by AAA’s AV Testing team, in collaboration with two key partners, is being presented at the SAE World Congress 2020.

Of the many technical standards groups and industry conferences, SAE and World Congress stand apart. This year, in place of the Detroit event, we are excited to support SAE WCX virtually and showcase our research via the Digital Summit.

Both the papers, and their oral presentations, are available for on-demand viewing at the SAE WCX website in the Body / Chassis / Occupant and Pedestrian Safety Structure technical sessions category. 

Modes of Automated Driving System Scenario Testing: Experience Report and Recommendations (SAE Paper 2020-01-1204) 

This research, performed in collaboration with University of Waterloo’s Professor Krzysztof Czarnecki, Michal Antkiewicz and team at the Waterloo Intelligent Systems (WISE) lab, explores testing autonomous vehicles using four different modes including simulation, mixed reality testing, and test-track testing. The team tested UW’s automated vehicle, dubbed “UW Moose”, through six rigorous scenario tests in different modes, and compared and contrasted their benefits and drawbacks. The paper closes with 12 recommendations on choosing testing modes for automated driving systems.

The SAE paper 2020-01-1204 may be purchased here

Development of a Collision Avoidance Capability Metric (SAE Paper 2020-01-1207)

This research paper discusses the development and application of a novel metric for evaluating and quantifying the capability of a vehicle / controller (including a human driver) to avoid potential future collisions. The metric was developed in partnership with DRI, and is applicable to potentially any scenario, including with multiple actors and roadside objects. 

The SAE paper 2020-01-1207 may be purchased here


To discuss these, and other research at GoMentum, feel free to contact Atul Acharya or Paul Wells at [email protected] 

Accelerating ADAS and AV Development with the GoMentum Digital Twin

Side-by-side of real world and virtual

GoMentum Station and Metamoto announce a new digital twin simulator to accelerate the development and testing of ADAS features and automated driving systems

Authored by Paul Wells

Applying autonomy to safety critical applications — most notably, autonomous vehicles — requires extensive verification and validation. The majority of this testing takes place in scalable simulation tools whereas much of validation and verification is still accomplished in the physical world. As such, correlation between these test modes remains an important consideration when evaluating and advancing the overall efficiency of test efforts.

Metamoto and GoMentum Station are excited to partner to offer the GoMentum Bunker City scene within Metamoto’s massively scalable, cloud-based simulator to help promote this connectivity. Using the GoMentum scene, developers are able to both drive efficiency and go deeper into specific validation subdomains. Contact us for more information about the digital twin.


Physical testing is notoriously resource intensive. To help drive costs down, developers and test engineers are faced with two key options: perform as much virtual testing as possible, and identify the smallest, but still significant, set of physical tests required to produce meaningful assurance and test results. Simulation, when integrated with physical test environments, is an incredibly powerful ally in both of these efforts.


Virtual bunkers

In addition to supporting the overall development and advancement of an autonomous stack, virtual testing within a digital twin environment effectively allows for a faster and more efficient bug discovery process before deploying hardware and drivers into a physical environment. This reduces the likelihood of time-consuming troubleshooting that pops up during the precious time spent either on a track or on public roads. GoMentum, for example, is an unusual environment. The use of the Metamoto scene and simulator allows for virtual exploration of the environment such that static objects in the environment, like bunkers and tall grasses, and dynamic agents, such as cows, turkeys and the like, do not disrupt time spent on the track. Turnaround times thereby become faster, and the cycle between virtual and physical testing continues efficiently.

Empty road

Parity Testing

An added benefit of leveraging digital twins is the potential for parity into the conformance of virtual and physical test results. Due to the increasing importance of, and reliance on, simulation testing, understanding the relationship between virtual and physical test results is critical. This exploration of parity testing allows for greater awareness of the relationship between sample use cases and model fidelities leveraged in simulation. While the goal of this awareness is principally safety, it also has the potential to drive even greater use of simulation tools.

As part of the new GoMentum Bunker City Digital Twin, GoMentum and Metamoto invite industry and academic participants to partake in research projects covering feature development, parity testing, safety testing, and more. 

Digital image


For more information, reach out to [email protected] and [email protected].

Understanding Unsettled Challenges in Autonomous Driving Systems

Recently published SAE Edge Reports, co-authored by test and research team at AAA Northern California, Nevada and Utah, highlights the key issues that the autonomous vehicles industry continues to face.

Authored by Atul Acharya

SAE EDGE Reports 2019
SAE EDGE Reports on Simulation and Balancing ADS Testing

The promise of highly automated vehicles (HAVs) has long been in reducing crashes, easing traffic congestion, ferrying passengers and delivering goods safely — all while providing more accessible and affordable mobility enabling new business models.

However, the development of these highly complex, safety-critical systems is fraught with extremely technical, and often obscure, challenges. These systems are typically comprised by four key modules, namely:

(i) the perception module, which understands the environment around the automated vehicle using sensors like cameras, lidars, and radars,

(ii) the prediction module, which predicts where all other dynamic actors and agents (such as pedestrians, bicyclists, vehicles, etc.) will be moving in the next 2-10 seconds,

(iii) the planning module, which plans the AV’s own path, taking into account the scene and dynamic constraints, and

(iv) the control module, which executes the trajectory by sending commands to the steering wheel and the motors.

If you are an AV developer working on automated driving systems (ADS), or perhaps even advanced driver assistance systems (ADAS), you are already using various tools to make your job easier. These tools include simulators of various kinds — such as scenario designers, test coverage analyzers, sensor models, vehicle models — and their simulated environments. These tools are critical in accelerating the development of ADS. However, it is equally important to understand the challenges and limitations in using, deploying and developing such tools for advancing the benefits of ADS.

We in the AV Testing team at GoMentum actively conduct research in AV validation, safety, and metrics. Recently, we had an opportunity to collaborate with leading industry and academic partners to highlight these key challenges. Organized by SAE International, and convened by Sven Beiker, PhD, founder of Silicon Valley Mobility, two workshops were organized in late 2019 to better understand various AV testing tools. The workshop participants included Robert Siedl (Motus Ventures), Chad Partridge (CEO, Metamoto), Prof. Krzysztof Czarnecki and Michał Antkewicz (both of University of Waterloo), Thomas Bock (Samsung), David Barry (Multek), Eric Paul Dennis (Center for Automotive Research), Cameron Gieda (AutonomouStuff), Peter-Nicholas Gronerth (fka), Qiang Hong (Center for Automotive Research), Stefan Merkl (TUV SUD America), John Suh (Hyundai CRADLE), and John Tintinalli (SAE International), along with AAA’s Atul Acharya and Paul Wells.

Simulation Challenges

One of the key challenges encountered in developing ADS is in developing accurate, realistic, reliable and predictable models for various sensors (cameras, lidars, radars), and actors and agents (such as vehicles of various types, pedestrians, etc.) and the world environment around them. These models are used for verification and validation (V&V) of advanced features. Balancing the model fidelity (“realism”) of key sensors and sub-systems while developing the product is a key challenge. The workshop addressed these important questions:

  1. How do we make sure simulation models (such as for sensors, vehicles, humans, environment) represent real-world counterparts and their behavior?
  2. What are the benefits of a universal simulation model interface and language, and how do we get to it?
  3. What characteristics and requirements apply to models at various levels, namely, sensors, sub-systems, vehicles, environments, and human drivers?

To learn more about these and related issues, check out the SAE EDGE Research Report EPR2019007:


Balancing ADS Testing in Simulation Environments, Test Tracks, and Public Roads

If you are a more-than-curious consumer of AV news, you might be familiar with various AV developing companies stating proudly that they have “tested with millions or billions of miles” in simulation environments, or “millions of miles” on public roads. How do simulation miles translate into real-world miles? Which matters more? And why?

If you have thought of these questions, then the second SAE EDGE report might be of interest.

The second workshop focused on a broader theme: How should AV developers allocate their limited testing resources across different modes of testing, namely simulation environments, test tracks, and public roads? Given that each mode of testing has its own benefits and limitations, and can accelerate or hinder development of ADS accordingly, this question is of paramount importance if the balance of limited resources is askew.

This report seeks to address three most critical questions:

  1. What determines how to test an ADS?
  2. What is the current, optimal, and realistic balance of simulation testing and real-world testing?
  3. How can data be shared in the industry to encourage and optimize ADS development?

Additionally, it touches upon other challenges such as:

  • How might one compare virtual and real miles?
  • How often should vehicles (and their subsystems) be tested? And in what modes?
  • How might (repeat) testing be made more efficient?
  • How should companies share testing scenarios, data, methodologies, etc. across industry to obtain the maximum possible benefits?

To learn more about these and related challenges, check out the SAE EDGE Research Report EPR2019011:


Physical Test Efficiency via Virtual Results Analysis

AAA Northern California, UC Berkeley and LG Silicon Valley Labs partner to examine the use of digital twin and parameterized testing to drive efficiency in closed-course testing. For a summary of the study download the AAA-UCB-LG AV Testing Project Whitepaper.

Authored by Paul Wells

Physical test execution at GoMentum Station. Pedestrian scenario along Pearl St. in “Downtown” zone.

UPDATE: On August 20, 2020 UC Berkeley hosted a webinar on the research outlined in this article and tools leveraged in the work. Visit the YouTube link for an in-depth overview.

Simulation and physical testing are widely known to be complementary modes of automated vehicle verification & validation (V&V). While simulation excels in scalable testing of software modules, physical testing provides the ability for full-vehicle, everything-in-the-loop testing and “real life” evaluations of sensing stacks or downstream perception / sensor fusion layers. These tradeoffs, in part, contribute to the test distribution approach put forth by Safety First For Automated Driving (SaFAD).

Matrix that maps test modes to use cases

“Test Platform and Test Item”. Safety First for Automated Driving, Chapter 3: Verification & Validation (2019).

Due to the ability of simulation to scale rapidly, much work is currently underway to resolve its core limitation (fidelity). Developments in modeling (sensors, environment, vehicle dynamics, human drivers, etc.) and fast, photorealistic scene rendering all stand to expand the scope of validation exercises that can be performed virtually. Much less studied, however, are the means by which physical testing can improve upon its core limitation (efficiency). Whereas the speed of virtual testing allows for near constant re-submission and re-design of test plans, the physical world is much less forgiving. Being strategic about time spent on a track is therefore vital to maximize the efficiency of physical testing. Although many inputs into efficient physical testing are fixed (e.g. the inherent slowness in working with hardware), it is unclear whether the utility of physical test data also scales linearly alongside the volume of test execution. In other words, are the results of all physical tests equally valuable, or are there certain parameters within a given scenario which, if realized concretely at the track, would result in more valuable insights? If so, how might one discover these higher-value parameters using a virtual test environment?

These questions were central to our three-way research partnership between GoMentum Station (owned and operated by AAA Northern California), UC Berkeley VeHiCaL, and LG Silicon Valley Lab (LG SVL). Accordingly, we modeled a digital twin of GoMentum in the LG Simulator and leveraged an integration between Berkeley’s Scenic scenario description language (Scenic) and LG’s simulation platform. We elected a pedestrian crossing scenario and used the Scenic-LG integration to parameterize the scenario along several relevant vectors, executing nearly a thousand different virtual test cases. The results of one such test set were as follows, where rho is a proxy for the minimum distance between the ego vehicle and pedestrian target. Within this plot, we identified the clustering patterns along rho to be most interesting. As such, we elected eight cases for physical test execution: two failures (F), three successes (S), and three marginal (M) cases, where failure cases exhibited minimum distance values of less or equal to .4 meters. In short, the results from physical testing established safe/marginal/unsafe parity across test modes.

3-D plot depicting simulation test results

Each point represents a test case. X = pedestrian start delay (s), Y = pedestrian walk distance (m), Z = pedestrian hesitate time (s). Rho = proxy for minimum distance.

While the concept of correlating virtual and physical tests is not in itself novel, our results provide evidence to suggest that parametrization and analysis of virtual test results can be used to inform physical test plans. Specifically, the framework of recreating “low-rho” failure cases physically and within a reasonable degree of conformance to virtual runs allows for the capture of rich ground-truth data pertaining to Vehicle Under Test (VUT)-specific critical cases — all without the guesswork involved in manually tuning scenario parameters at the physical test site. Because we were able to tease out deficiencies of the AV stack only after running ten odd test cases, the utility of data captured relative to time spent onsite was significant. As compared to relatively unstructured physical test exercises or arbitrarily assigned scenario parameters, this framework represented an efficient means of both discovering and recreating physical critical cases.

Stepping outside this framework and the original research scope, our results also provide evidence of several challenges in validating highly automated vehicles (HAV). Even within our relatively low volume of tests, one test case (M2) produced a false negative when transitioning from virtual to physical testing — underscoring the importance of testing virtually and physically, as well as the difficulty in interpreting simulation results or using simulation results as a proxy for overall vehicle competence. We were also surprised by the exceptionally sensitive tolerances within each parameter. In certain cases the difference between a collision / no collision was a matter of milliseconds in pedestrian hesitation time, for instance. This underscores both the brittleness and, to a lesser extent, the non-determinism of machine learning algorithms — two of the broader challenges in developing AI systems for safety-critical applications.

Physical test execution at GoMentum Station. Pedestrian scenario along Pearl St. in “Urban Zone”.

Importantly, these challenges face industry and independent assessors alike. Current test protocols for agencies like EuroNCAP are very rigid. Vehicle speed, a primary test parameter in an AEB VRU test protocol for instance, varies along a step function with a range from ~20-60 kph and increments of 5kph. While perhaps suitable for L2 systems where drivers remain the primary line of defense, this approach clearly contradicts the parameter sensitivities exhibited above. If independent assessors hope to ascertain meaningful conclusions from the results of physical test exercises, these exercises will need to be highly contrived — i.e. not only will the scenarios need to be chosen according to a particular operational domain design (ODD), but the parameters used to concretize physical tests should in fact be assigned by inquiry — perhaps even VUT-specific inquiry — instead of by top-down, industry-wide mandate. This could necessitate an assessment framework built around coverage and safety case development rather than test-by-test scoring and comparison — an approach encouraged by work from groups like UL 4600 and Foretellix.

Many open questions remain in examining HAV test efficiency and overall HAV validation processes. We look forward to using the insights above in 2020 to delve deeper into research and to continue forging relationships with industry. We will be engaging in subsequent projects within the SAE IAMTS coalition to further explore both the toolchain and process used for correlating virtual and physical results.

This work also generated a number of outputs that we look forward to sharing with the industry. All videos from our track testing at GoMentum are available here, with recorded simulation runs here. The underlying datasets — both from ground truth at the track and outputs of the simulation — are also being used for further analysis and safety metric development.

Introducing The GoMentum Blog

The Bay Area has been the epicenter of the self-driving vehicle industry since Waymo’s Chauffer project in 2013. Over the last seven-odd years the industry evolved tremendously; sixty-six companies are currently permitted to test vehicles on California public roads. For Waymo and startups alike, public road testing is but one environment in a broader testing regime. Simulation, of course, is another dominant mode – with industry leaders citing millions or billions of virtual miles driven. Similarly though, controlled testing on private roads is another piece of a holistic approach to validation, and one that has been employed in the auto industry for decades (GM first bought Milford in 1936).

When the AV industry began, Bay Area teams were equipped primarily with research vehicles and needed only parking lots for controlled, full-vehicle testing before moving to public road driving. Especially over the last two years, now with fleets of vehicles driving on public roads and a focus on scale, most of the industry has outgrown parking lots. GoMentum, a military base turned proving ground since 2014, serves as a more robust tool for the growing needs of validation teams. AAA Northern California, Nevada & Utah became formally involved in 2018 and has since invested into infrastructure and operations to support efficient use of the site. We’re proud to partner with multiple companies.

As the use of GoMentum as a development tool grew, we also began to think about other opportunities – beyond the physical facility – that could advance the industry’s understanding of safety. Guided by many questions that have crystalized in the field – most notably, how safe is safe enough? – GoMentum created a research agenda in 2019. This agenda sought to complement the work of ISO 21448, UL 4600, Pegasus, and Safety First for Automated Driving by focusing on three key areas: methods for full-vehicle validation, the relationship between physical and virtual testing, and safety metric development.

Having wrapped several of these projects, we’re now excited to share our results. Please stay tuned as we continue to release more about our work and offer insights into questions like: How can AV safety be measured? What are the challenges introduced when distributing validation across physical and virtual environments? What is the state of the current validation toolchain? Which requirements are important for vehicle safety? What needs to be done by legislators and policy makers to ensure the safety of the public?