Gomentum Station Responds to New AV Research

By the GoMentum Station Team

For over a century, AAA has been an advocate for transportation safety. In keeping with this mission, AAA Northern California operates GoMentum Station, the nation’s largest dedicated automated vehicle testing facility in the US.

GoMentum Station has engaged in and provided perspective on Automated Driving Systems (ADS) safety research since 2019. Today, Waymo released new ADS research where they reconstructed real-world accidents in simulation and tested their technology in these scenarios. GoMentum Station supports data sharing, which enables the industry to improve vehicle safety collectively. 

AAA Northern California, via GoMentum Station, regularly engages in safety research for ADS, including collaborations with leading industry and academic partners. In March 2019, GoMentum Station organized its first workshop on AV safety metrics. The workshop urged the necessity for an enhanced transparency process of ADS safety evaluation, data sharing for safety research, and additional standardized safety assessments, including safety metrics. GoMentum Station has pursued engagements in each of these.

Therefore, GoMentum Station would like to highlight the following points from this research:

  • Waymo’s reconstruction of high-severity collisions from the real-world to simulation is highly relevant and impactful for ADS development. GoMentum Station’s research in novel safety metrics for collision avoidance, formal validation methodologies for ADS underscore the importance of the latest safety metrics and methodologies, as well as the correlation of simulation with real-world testing. 
  • A combination of testing modes is needed to assess ADS safety. Waymo uses three testing methods, including simulation, closed-track testing, and public testing to develop their technology. GoMentum’s collaboration with the University of Waterloo, presented at the SAE World Congress in 2020, aligns with this approach.
  • Sharing safety scenarios and testing results across the industry is critical for validating ADS. GoMentum Station is engaged in a collaborative effort to safely deploy AV using a data-driven policy framework via the World Economic Forum’s Safe Drive Initiative. To continue our pursuit of safer autonomous vehicles, we welcome collaboration and information sharing across the industry, including Waymo.

GoMentum is the nation’s largest dedicated connected and automated vehicle testing facility in the US, located 65 miles from Silicon Valley. Owned and operated by AAA Northern California, the proving ground has over 20 miles of roads and highways set in 2,100 acres. Vehicle technology tested, including ADAS features in partnership with AAA National, at GoMentum Station will redefine the next generation of transportation, bring unprecedented mobility options to people, and advance traffic safety towards zero fatalities. Learn more at GoMentumStation.net.

A Roadmap for Accelerating Safe Automated Driving Deployment

By Atul Acharya

The automated vehicle (AV) industry has been grappling with critical questions for some time now regarding development of automation. Along with the industry, however, regulators in various jurisdictions have also been grappling with their own concerns. Namely, how might the benefits of automation be disseminated widely, yet safely, among the general public? How should regulators decide which vehicles should be deployed? How should AVs be evaluated for safety in the first place? What should a technical evaluation process that is fair to all look like? These questions are hardly those of the regulators’ alone; in fact, AV developers and other stakeholders also have similar concerns, primarily because they all are either direct beneficiaries or are directly responsible for creating safe technologies.

The World Economic Forum (WEF), earlier this year, launched an initiative to help regulators — local, national and international — create a data-driven policy framework to help address these questions. In partnership with the consulting firm McKinsey & Co, and technology platform company Deepen.ai, the Forum launched the Safe Drive Initiative (SafeDI) to formalize just such a framework. The Forum invited several industry participants, including AAA Northern California’s Autonomous Vehicle team, several leading AV developers, policy makers, academics and safety experts to help develop this framework.

AAA Northern California’s team was led by Atul Acharya Director, AV Strategy and Operations and Xantha Bruso Manager, AV Policy. As a key contributor to the steering committee, the team helped guide the framework development by asking big questions. The expertise gained from testing AVs at GoMentum Station was critical in helping develop the scenario-based assessment framework. Going deeper, the committee asked critical questions, such as:

  • How should AVs be evaluated for safe automation? 
  • How should the operational design domain (ODD) be specified such that an equivalence can be established between testing mode and deployment mode?
  • How should regulators design a scenario-based assessment framework, given that the vast majority of scenarios (~approximately 1-10 million) may never be tested on roads?
  • What combination of testing modes — such as simulation, closed-course testing, and open road testing — should the AVs be tested in?
  • What safety metrics matter when AV performance is assessed? And which metrics should be made public?
  • How should regulators ask for such metrics when they do not necessarily know all the technical implementation details?
  • What is the role of independent testing in evaluating AVs?
  • How should scenarios be shared within the industry so that safety is not a competition but an essential requirement?

Over the course of 2020, the steering committee met monthly to guide the framework development process. The committee created several technical work groups comprised of experts academia and industry that each explored various technical aspects of the framework, such as defining ODD; elaborating scenario-based assessment;  exploring available and upcoming technical safety standards, such as ANSI/UL 4600; and exploring AV policy regimes with examples from light-touch (e.g. US-based) to high-touch (e.g. Singapore / EU based) approaches, and identifying gaps in these policies. 

The group defined a four-stage, graduated approach to testing and assessing AVs, taking into account the requirements from various stakeholders, including the general public, the ultimate beneficiaries of automation. Broadly speaking, the Safe Drive Initiative seeks to improve regulators’ decision-making abilities on automated vehicles technologies. 

The guiding principles of the framework include:

  • Multi-stakeholder approach – regulators and AV developers should benefit from the framework and find the guidance both practical implementable
  • Scenario-based assessment – use of key scenarios within deployment ODD to evaluate the AV’s performance, while noting such a scenario-database would be a starting point, not an end-goal
  • Common set of metrics – leveraging a common set of metrics for AV assessment, such as ODD excursions, operational safety, and more (some developed, others still emerging in new standards) 
  • Covering simulation, closed-course testing, and on-road testing – using all three modes for evaluation to ensure efficiency and effectiveness of testing

The approach defined in the SafeDI framework is broadly divided into four stages:

  1. Prepare: convene necessary stakeholders, define the end goal, and establish process
  2. Define: establish the required behavioral competencies for the AV, define geographic areas, and parameters for each interim milestone
  3. Measure: specify on-road, controlled-environment, and simulation tests, and determine success/advancement criteria, 
  4. Execute: conduct tests, collect required data from AV developers as necessary, improving the safety assurance process as needed

This framework is designed to provide a high-level guidance to regulators. As such, it is flexible enough for regulators to adapt to their jurisdictions, and is detailed enough to cover underlying technology changes. The committee recognizes that no one-size-fits-all solution will be sufficient for all jurisdictions, and that customization at each stage will be balanced with standardization and harmonization at the highest levels. 

For full details of the policy framework, refer to WEF’s website at:

Safe Drive Initiative 

Safe Drive Initiative: Creating safe autonomous vehicle policy 

Safe Drive Initiative: Scenario-based AV Policy Framework 



The SafeDI framework enables regulators to evaluate AVs, potentially by independent testing organizations, such that regulators may focus their efforts on guiding AV developers, rather than performing the tests themselves. As such, this framework encourages the use of new and upcoming standards, such as ANSI/UL 4600 in safety evaluation of AVs. 

It is our hope that this approach will lead to a safer, more inclusive deployment of automated vehicles.

RAND releases commentary on ODD policy after research with AAA Northern CA

RAND and AAA Northern California conducted research into operational design domains (ODDs), which informed RAND’s commentary in support of state and local policies for AV safety.

Authored by Xantha Bruso

October 5, 2020

Last year, AAA Northern California convened a workshop on automated vehicle (AV) safety metrics, which was facilitated by the RAND Corporation (RAND) and Securing America’s Future Energy (SAFE). Individuals from 20 different AV-related companies, academic institutions and non-profits participated. The goals of this workshop were to:

1) Create alignment among industry – or at least among a critical subset – on the value of collaboratively supporting targeted research on AV safety metrics

2) Communicate the potential for roadmanship, as articulated by RAND in a 2018 report,[1] and/or related metrics to describe AV safety performance, and

3) Gather industry feedback and suggestions for ideas to improve understanding of these metrics and their role in communicating about AV safety.

One of the topics that emerged from the workshop was the need for a more rigorous and standardized definition of operational design domain (ODD), which was both required for and would advance several different AV safety metrics. Based on this finding, the project team (AAA Northern California, RAND, and SAFE) conducted a literature review and individual interviews with several workshop participants to further define the scope of a project to investigate ODD conceptualization. The hypothesis that emerged was that defining ODDs in a consistent manner would support the analysis of AV data such that ODD-specific safety metrics could be calculated, which would help illustrate the performance of different AVs (or different versions of the same AV) in a more accurate way.

Using this hypothesis, the project team reviewed additional literature and conducted more detailed stakeholder interviews[2] to develop the whitepaper found below, recently published on RAND’s website, which 1) describes how ODDs are currently used, and 2) explores how ODD conceptualization could be expanded or modified to support ODD priorities.

Since an ODD forms the basis of an AV’s operating model, which is used to specify driving tasks and requirements and to verify and validate ADS behavior, ODDs are a key element of AV safety. A main point of this report is that since ODDs help convey an AV’s capabilities and limitations, and since customer trust is earned by living up to expectations, clearly communicating to consumers when, where and how an AV will function is critical to building trust in this emerging technology.

Another point is that while ODDs can have innumerable categories and subcategories (helpfully contextualized by SAE’s recent best practice document on an ODD conceptual framework and lexicon),[3] an ODD can never be completely comprehensive to everything an AV may encounter. Therefore, an AV can find itself in situations for which it is unprepared. As such, testing to ensure an AV can perform safely within its ODD, recognize when it’s inside and outside its ODD, and respond safely to being outside its ODD is a critically important part of making an AV’s safety case.

To support an AV’s safety case, the report proposes an ODD verification function – a feature that can help an AV decide whether and how to devolve to a minimum risk condition. Creating such a feature would support a safety case by providing another layer of redundancy that checks on an AV’s ODD by relying on the vehicle’s sensors to assess the compatibility of the AV’s environment against its basic operational requirements.

To do this, an AV’s ODD must be characterized in a way that is detectable by an AV, technology neutral, and able to be tested by a third-party. By focusing on concepts like “visibility” (e.g., the AV’s ability to perceive an object 100 feet away with 95% accuracy while at 35mph) instead of whether visibility is impaired by rain, snow, leaves or a sensor malfunction, an ODD verification function can test against an AV’s ODD and provide an additional layer of safety, reassurance, and information to developers and consumers alike.

Whitepaper on ODD


[1] Fraade-Blanar L, Blumenthal MS, Anderson JM, Kalra N. Measuring Automated Vehicle Safety: Forging a Framework (RR-2662). Santa Monica, CA; 2018. https://www.rand.org/pubs/research_reports/RR2662.html

[2] To understand how ODD is currently used, conceptualizations and categorizations were drawn from the technical literature, policy guidelines, standards including ISO 26262, SOTIF, UL 4600, and Voluntary Safety Self-Assessments. To develop concepts further, subject matter experts from ADS developers, academia, and other sectors were consulted.

[3] SAE. AVSC Best Practice for Describing an Operational Design Domain: Conceptual Framework and Lexicon, AVSC00002202004, 2020: https://www.sae.org/standards/content/avsc00002202004/

Deep Dive Into AAA’s Latest Active Driving Assistance Report

GoMentum testing team leads closed-course evaluation of ADA systems

Authored by Atul Acharya

Today, AAA published the results of testing active driving assistance (ADA) functions available on several commercially available vehicles. The results show that critical ADA functions, namely Lane Keep Assist (LKA) and Adaptive Cruise Control (ACC) that drivers rely on, fall short of expectations. These ADA functions — categorized as SAE Level 2 (L2) automation — are a subcategory of the generally known advanced driver assistance systems (ADAS). 


AAA’s Automotive Engineering Group regularly performs research that benefits AAA’s 60 million members and the general public; this research is executed on commercially available vehicles (not on research prototypes). Previous research examined important ADAS functions such as automatic emergency braking (AEB) technology, revealing shortcomings of various available systems. Additional research recommended renaming various ADAS functions, a position now endorsed by SAE, noting how the commercial names for ADAS functions have become too confusing for consumers. These studies are independent evaluations and aim to be objective in their methodology and findings.

In the same vein, the recently concluded research aimed to examine the limitations of lane keep assistance and adaptive cruise control acting as one system. This L2 feature forms a core of automation functionality as vehicles become more complex on their way towards full automation. As more auto OEMs launch more ADAS features, it is imperative that motorists and consumers get an unbiased view of their benefits and limitations. Thus, the aim of the project was to find limitations of active driving assistance and inform both consumers and OEMs on their performance with the aim of improving them.

Traffic jam simulation

Why GoMentum?

The latest research was led by AAA’s Automotive Engineering Group, in collaboration with AAA Northern California’s AV Testing team at GoMentum Station, and Automobile Club of Southern California’s Automotive Research Center. The test plan included two equally important parts: closed-course testing at GoMentum Station, and naturalistic driving on highways between Los Angeles and San Francisco. The tests were conducted over a period of a few weeks in late October and early November 2019. The work at GoMentum was led by Atul Acharya along with Paul Wells.

GoMentum Station was specifically chosen for closed-course testing as it is one of the premier sites for AV and ADAS testing, and includes features such as 1-mile long straight roads in the Bunker City test that include fresh lane markings, along with curved roads like Kinne Boulevard that has degraded lane markings. These features are ideally suited for testing Lane Keep Assistance functions that rely on lane markings, with the degraded lane markings offering an additional challenge to the vehicle’s sensors. Other areas of Bunker City were used to test Traffic Jam Assist (TJA) functionality, as well as testing the subject vehicle approaching a simulated disabled vehicle.

For closed-course testing, the key questions were:

  • How do vehicles with active driving assistance systems perform during scenarios commonly encountered in highway situations?
  • Specifically: 
    • How well does the lane keep assist system perform?
    • How does a vehicle perform in stop-and-go traffic?
    • How does a vehicle respond to disabled vehicle on the roadway?


All vehicles were equipped with industry standard equipment such as:

  • OxTS RT 3000 – inertial measurement unit
  • OxTS RT-Range S hunter – for accurately tracking ranges to target vehicles
  • DEWEsoft CAN interfaces for reading CAN bus messages 
  • DEWEsoft CAM-120 cameras

Target vehicles were equipped with:

  • OxTS RT 3000 and OxTS RT-Range S

Lane survey equipment on 12th Street at GoMentum Station

Testing Methodology Overview

Lane Keep Assist Testing

Sustained lane-keeping functionality is one of the primary capabilities of active driving assistance. To test LKA, the roadway utilized must have visible lane markings. Prior to test, a lane survey was performed on GoMentum’s 12th street test zone, a straight, 1.2-mile roadway with clear and fresh lane markings. This roadway is ideal for high speed testing so that vehicles can be tested at various speeds. Using the same high-precision lane survey equipment from OxTS, a precise map of lane markings was created by walking the entire length of the road. The map is then used as an underlay when lane tests are performed.

During testing, the OxTS 3000 inertial measurement unit tracks the precise movement of the vehicle under test (VUT) as it moves along the road when LKA function is active. As part of configuration setup, a polygon is previously defined that marks the edge boundaries of the VUT. Range data is collected that determines precise lateral distances from the vehicle’s polygon boundaries (more specifically, from its leftmost and rightmost points)  to the nearest lane markings. All this data is captured at 100Hz, and then subsequently plotted. The charts show the vehicle’s lane centering position, as well as its distance to the right lane mark and the left lane mark. When charted appropriately, the data can show whether the VUT had any bias towards left/right placement when traveling in the lane.

Traffic Jam Assistance testing

Stop-and-go traffic situations are frequent while driving on highways. Nominally, adaptive cruise control (ACC) systems will “follow” a lead vehicle at a safe distance, accelerating automatically if the lead vehicle accelerates, and decelerating automatically if the lead vehicle decelerates. Of course, exactly what a “safe distance” is, and just how soon the vehicle accelerates or decelerates depends on the vehicle. Knowing the limits of these systems is important to motorists so that they are aware of potential risks. 

To test traffic-jam assistance, the team utilized a DRI Soft Car 360® on a Low Profile Robotic Vehicle (LPRV) platform. The DRI Soft Car 360® is a foam car that is mounted on the LPRV platform which itself can move at speeds of up to 50 mph. With the DRI Soft Car acting as a simulated “lead vehicle”, a vehicle under test (VUT) activates its ACC system (by reaching a certain speed, such as 30 mph) and then lets the ACC system follow the lead vehicle automatically. The lead vehicle is then programmed to accelerate for some time, which causes the VUT to accelerate while maintaining a safe distance. Similarly, the lead vehicle is then programmed to decelerate, which causes the VUT to decelerate. The lead vehicle once again accelerates, causing similar stop-and-go behavior in the VUT. At all times, the vehicles’ kinematic data is recorded in a data logger. The vehicles are subjected to varying levels of deceleration at 0.3g, 0.45g, 0.6g and three runs are performed for each VUT. The following distance, separation distance / time-to-collision at start of braking, speed differential at start of braking, average and max instantaneous deceleration are all recorded. When charted out, the data reveals the system performance. 

Simulated Disabled Vehicle approach testing

Driving on highways is often risky. AAA, the largest emergency road services (ERS) provider, alone handles over 30 million emergency road service requests nationwide every year. Encountering disabled vehicles on highways in a risky scenario for motorists. The team wanted to find how active driving assistance systems react when faced with such a situation. 

To create a disabled vehicle situation, the team created a simulated scenario with the DRI Soft Car 360 placed halfway on the roadway, with 50% of the soft car in the travel lane and the rest 50% on the right shoulder. A vehicle under test is then subjected to this situation and its ADA system reaction is noted. 


So how did these vehicles perform? While active driving systems mostly worked the way they were designed, there were notable shortcomings in their performance when these systems were pushed to the limits. Consumers and motorists should always be vigilant and attentive when driving, and be ready to take over at a moment’s notice whenever these L2 automation systems are active. 

To learn more about the ADA L2 Testing, please download the full report.

If you are an AV or ADAS developer, or a technology vendor working on core components of automation, and would like to confidentially test the limits of your system, or to learn more about the ADA L2 Testing project, please get in touch with Atul Acharya or Paul Wells at: [email protected] 

Top 5 features at GoMentum Station for testing AVs


At GoMentum Station, 2100 acres and 20+ miles of roadway are available to our customers to test their ADAS, autonomous and connected technologies, only 65 miles from Silicon Valley. Here are the top 5 features that make GoMentum Station the premiere testing destination for our customers:

#1: Bunker City has it all: multi-directional lanes, various striping patterns, traffic lights, bike and pedestrian infrastructure, and speeds from 15 to 55 mph. Set among hundreds of former munitions bunkers (hence the name), Bunker City proves challenging in many ways, from multipath navigation options to localization in a homogeneous environment.


#2: Downtown is our take on ‘Main Street’ America. From teams’ operational base at our WWII firehouse, vehicles can train on freshly paved streets that mimic a suburban environment. With up to four travel lanes, center turn lanes, on-street parking, bus stops and bike lanes, Downtown will put even the best passenger or delivery service vehicles to the test.


#3: 12th Street – Need a one mile straightaway that looks like every stretch of highway in America? We’ve got you covered. Teams use 12th street to test ADAS features such as lane change assist, adaptive cruise control and emergency braking.



#4: Tunnel Road – Want to challenge your communication and perception systems? GoMentum has two 600 foot tunnels built of two foot thick concrete encased in a sheet of corrugated metal. Stark light contrast at all hours of the day will blind both human eyes and cameras – will your vehicle see the pedestrian crossing at the end of the tunnel?


#5 Kinne Boulevard – Our longest stretch of road at 4.5 miles, Kinne Boulevard offers the opportunity to test continuous driving with various straightaways and curves.

To find our more about the unique features at GoMentum Station contact us to schedule a tour of the proving ground.

GoMentum Presents at SAE World Congress WCX 2020 Digital Summit

GoMentum Station presents two key papers at SAE WCX 2020, on testing automated driving systems, and on a novel collision avoidance safety metric.

We are excited to announce that research performed by AAA’s AV Testing team, in collaboration with two key partners, is being presented at the SAE World Congress 2020.

Of the many technical standards groups and industry conferences, SAE and World Congress stand apart. This year, in place of the Detroit event, we are excited to support SAE WCX virtually and showcase our research via the Digital Summit.

Both the papers, and their oral presentations, are available for on-demand viewing at the SAE WCX website in the Body / Chassis / Occupant and Pedestrian Safety Structure technical sessions category. 

Modes of Automated Driving System Scenario Testing: Experience Report and Recommendations (SAE Paper 2020-01-1204) 

This research, performed in collaboration with University of Waterloo’s Professor Krzysztof Czarnecki, Michal Antkiewicz and team at the Waterloo Intelligent Systems (WISE) lab, explores testing autonomous vehicles using four different modes including simulation, mixed reality testing, and test-track testing. The team tested UW’s automated vehicle, dubbed “UW Moose”, through six rigorous scenario tests in different modes, and compared and contrasted their benefits and drawbacks. The paper closes with 12 recommendations on choosing testing modes for automated driving systems.

The SAE paper 2020-01-1204 may be purchased here

Development of a Collision Avoidance Capability Metric (SAE Paper 2020-01-1207)

This research paper discusses the development and application of a novel metric for evaluating and quantifying the capability of a vehicle / controller (including a human driver) to avoid potential future collisions. The metric was developed in partnership with DRI, and is applicable to potentially any scenario, including with multiple actors and roadside objects. 

The SAE paper 2020-01-1207 may be purchased here


To discuss these, and other research at GoMentum, feel free to contact Atul Acharya or Paul Wells at [email protected] 

Accelerating ADAS and AV Development with the GoMentum Digital Twin

Side-by-side of real world and virtual

GoMentum Station and Metamoto announce a new digital twin simulator to accelerate the development and testing of ADAS features and automated driving systems

Authored by Paul Wells

Applying autonomy to safety critical applications — most notably, autonomous vehicles — requires extensive verification and validation. The majority of this testing takes place in scalable simulation tools whereas much of validation and verification is still accomplished in the physical world. As such, correlation between these test modes remains an important consideration when evaluating and advancing the overall efficiency of test efforts.

Metamoto and GoMentum Station are excited to partner to offer the GoMentum Bunker City scene within Metamoto’s massively scalable, cloud-based simulator to help promote this connectivity. Using the GoMentum scene, developers are able to both drive efficiency and go deeper into specific validation subdomains. Contact us for more information about the digital twin.


Physical testing is notoriously resource intensive. To help drive costs down, developers and test engineers are faced with two key options: perform as much virtual testing as possible, and identify the smallest, but still significant, set of physical tests required to produce meaningful assurance and test results. Simulation, when integrated with physical test environments, is an incredibly powerful ally in both of these efforts.


Virtual bunkers

In addition to supporting the overall development and advancement of an autonomous stack, virtual testing within a digital twin environment effectively allows for a faster and more efficient bug discovery process before deploying hardware and drivers into a physical environment. This reduces the likelihood of time-consuming troubleshooting that pops up during the precious time spent either on a track or on public roads. GoMentum, for example, is an unusual environment. The use of the Metamoto scene and simulator allows for virtual exploration of the environment such that static objects in the environment, like bunkers and tall grasses, and dynamic agents, such as cows, turkeys and the like, do not disrupt time spent on the track. Turnaround times thereby become faster, and the cycle between virtual and physical testing continues efficiently.

Empty road

Parity Testing

An added benefit of leveraging digital twins is the potential for parity into the conformance of virtual and physical test results. Due to the increasing importance of, and reliance on, simulation testing, understanding the relationship between virtual and physical test results is critical. This exploration of parity testing allows for greater awareness of the relationship between sample use cases and model fidelities leveraged in simulation. While the goal of this awareness is principally safety, it also has the potential to drive even greater use of simulation tools.

As part of the new GoMentum Bunker City Digital Twin, GoMentum and Metamoto invite industry and academic participants to partake in research projects covering feature development, parity testing, safety testing, and more. 

Digital image


For more information, reach out to [email protected] and [email protected].

Select experiences in testing AV perception systems 

GoMentum Station and AAA Northern California, Nevada & Utah partner with University of Waterloo Center for Automotive Research (WatCAR) to validate AV safety performance and compare results across virtual and physical scenario tests.

Authored by Paul Wells

A key challenge in validating autonomous vehicles — and, more broadly, safety-critical applications of AI — is the brittleness of machine learning (ML) based perception algorithms [1] [2]. This challenge is significant because errors in sensing & perception affect all downstream modules; if a system cannot properly detect its environment, it will have a diminished likelihood of completing its goal within this same environment. Although focused on full-vehicle testing via structured scenarios, our work with University of Waterloo highlighted this key issue. 

Our research subjected Waterloo’s automated research vehicle, Autonomoose , to several predefined scenarios: roundabout with road debris, traffic jam assist, standard pedestrian crossing, intersection pedestrian crossing, and stop sign occlusion. 

These scenarios were tested first in simulation using the University’s WiseSim platform and the GeoScenario SDL, then re-tested physically at a closed-course. Although intended as a broad exploration of the utility of controlled, physical testing for autonomous vehicle validation, this project nonetheless surfaced findings which — albeit specific to our given tech-stack — reinforce the otherwise well-documented challenges in validating autonomous systems.vehicle and test dummy

A few highlights of our experience paper, specifically as related to test result discrepancies due to the perception system, can be summarized as follows:

Simulation architecture in this case provided “perfect perception”. As such, our virtual tests assumed that all actors in were perceived and understood by the system, in this case a modified instance of Baidu’s Apollo. This assumption led to large discrepancies between virtual and physical tests, especially in scenarios containing pedestrians. In our case, once the vehicle was introduced to the physical environment, perception-related deficiencies resulted in a large number of inconsistent or completely missed detection. During the pedestrian crossing, for instance, the AV struggled with intermittent loss of pedestrian perception due to issues with the object detection module. In simulation, however, the pedestrian was readily detected. Our video at the top of the fold shows performance on the physical track, while the below shows behavior in simulation.

Sensor fidelity in simulation was limited. Further highlighting the importance of closely tracked sensor model fidelity, the modeled LIDAR beam pattern used in simulation did not match the real specs of the on-vehicle sensor. This issue was uncovered due to conflicting virtual-physical test results within a road debris scenario, designed to assess the vehicle’s ability to detect small, stationary objects. As described in our paper, “The scenario exposed some lack of fidelity in the simulated lidar which caused inconsistent road debris detection behavior in simulation compared to the closed course. The lidar mounted on the roof of the SV had a non-linear vertical beam distribution, whereby the beams were grouped more densely around the horizon, and were too sparse to reliably detect the debris on the ground. In contrast, the simulated lidar had a linear vertical beam distribution, i.e., the beams were spaced out evenly. Consequently, implementing the non-linear beam distribution in the simulated lidar resulted in SV behavior in simulation consistent with the SV behavior on the closed course.”  Pictured below, the Autonomoose fails to detect a cardboard box on the road during physical testing.

Environmental fidelity was limited. Finally, our use of WiseSim primarily involved a re-creation of the road network — but not the visual scene — present during closed-course testing. This introduced small complexities when Autonomoose was ultimately introduced into the physical track. Principally, uneven road slopes at the physical track created unexpected LiDAR returns and false detections onboard the vehicle. Because WiseSim did not recreate ground plane measurements, we encountered a bit of de-bugging at the track. This reiterates the need for close tracking of virtual model fidelity when using simulation to prepare for track tests and, more broadly, when performing correlations between virtual & physical tests.

Although these findings may not all generalize to the broader domain of AV testing, they nonetheless provide concrete instances of theoretical challenges facing the field. We will continue exploring these challenges with industry partners and sharing results. We also welcome inquiries about the scenarios used, technical assets created, and data sharing.

For the full paid SAE report visit their website to download.

For further info about the GoMentum testing team, please email us!


Understanding Unsettled Challenges in Autonomous Driving Systems

Recently published SAE Edge Reports, co-authored by test and research team at AAA Northern California, Nevada and Utah, highlights the key issues that the autonomous vehicles industry continues to face.

Authored by Atul Acharya

SAE EDGE Reports 2019
SAE EDGE Reports on Simulation and Balancing ADS Testing

The promise of highly automated vehicles (HAVs) has long been in reducing crashes, easing traffic congestion, ferrying passengers and delivering goods safely — all while providing more accessible and affordable mobility enabling new business models.

However, the development of these highly complex, safety-critical systems is fraught with extremely technical, and often obscure, challenges. These systems are typically comprised by four key modules, namely:

(i) the perception module, which understands the environment around the automated vehicle using sensors like cameras, lidars, and radars,

(ii) the prediction module, which predicts where all other dynamic actors and agents (such as pedestrians, bicyclists, vehicles, etc.) will be moving in the next 2-10 seconds,

(iii) the planning module, which plans the AV’s own path, taking into account the scene and dynamic constraints, and

(iv) the control module, which executes the trajectory by sending commands to the steering wheel and the motors.

If you are an AV developer working on automated driving systems (ADS), or perhaps even advanced driver assistance systems (ADAS), you are already using various tools to make your job easier. These tools include simulators of various kinds — such as scenario designers, test coverage analyzers, sensor models, vehicle models — and their simulated environments. These tools are critical in accelerating the development of ADS. However, it is equally important to understand the challenges and limitations in using, deploying and developing such tools for advancing the benefits of ADS.

We in the AV Testing team at GoMentum actively conduct research in AV validation, safety, and metrics. Recently, we had an opportunity to collaborate with leading industry and academic partners to highlight these key challenges. Organized by SAE International, and convened by Sven Beiker, PhD, founder of Silicon Valley Mobility, two workshops were organized in late 2019 to better understand various AV testing tools. The workshop participants included Robert Siedl (Motus Ventures), Chad Partridge (CEO, Metamoto), Prof. Krzysztof Czarnecki and Michał Antkewicz (both of University of Waterloo), Thomas Bock (Samsung), David Barry (Multek), Eric Paul Dennis (Center for Automotive Research), Cameron Gieda (AutonomouStuff), Peter-Nicholas Gronerth (fka), Qiang Hong (Center for Automotive Research), Stefan Merkl (TUV SUD America), John Suh (Hyundai CRADLE), and John Tintinalli (SAE International), along with AAA’s Atul Acharya and Paul Wells.

Simulation Challenges

One of the key challenges encountered in developing ADS is in developing accurate, realistic, reliable and predictable models for various sensors (cameras, lidars, radars), and actors and agents (such as vehicles of various types, pedestrians, etc.) and the world environment around them. These models are used for verification and validation (V&V) of advanced features. Balancing the model fidelity (“realism”) of key sensors and sub-systems while developing the product is a key challenge. The workshop addressed these important questions:

  1. How do we make sure simulation models (such as for sensors, vehicles, humans, environment) represent real-world counterparts and their behavior?
  2. What are the benefits of a universal simulation model interface and language, and how do we get to it?
  3. What characteristics and requirements apply to models at various levels, namely, sensors, sub-systems, vehicles, environments, and human drivers?

To learn more about these and related issues, check out the SAE EDGE Research Report EPR2019007:


Balancing ADS Testing in Simulation Environments, Test Tracks, and Public Roads

If you are a more-than-curious consumer of AV news, you might be familiar with various AV developing companies stating proudly that they have “tested with millions or billions of miles” in simulation environments, or “millions of miles” on public roads. How do simulation miles translate into real-world miles? Which matters more? And why?

If you have thought of these questions, then the second SAE EDGE report might be of interest.

The second workshop focused on a broader theme: How should AV developers allocate their limited testing resources across different modes of testing, namely simulation environments, test tracks, and public roads? Given that each mode of testing has its own benefits and limitations, and can accelerate or hinder development of ADS accordingly, this question is of paramount importance if the balance of limited resources is askew.

This report seeks to address three most critical questions:

  1. What determines how to test an ADS?
  2. What is the current, optimal, and realistic balance of simulation testing and real-world testing?
  3. How can data be shared in the industry to encourage and optimize ADS development?

Additionally, it touches upon other challenges such as:

  • How might one compare virtual and real miles?
  • How often should vehicles (and their subsystems) be tested? And in what modes?
  • How might (repeat) testing be made more efficient?
  • How should companies share testing scenarios, data, methodologies, etc. across industry to obtain the maximum possible benefits?

To learn more about these and related challenges, check out the SAE EDGE Research Report EPR2019011:


Physical Test Efficiency via Virtual Results Analysis

AAA Northern California, UC Berkeley and LG Silicon Valley Labs partner to examine the use of digital twin and parameterized testing to drive efficiency in closed-course testing. For a summary of the study download the AAA-UCB-LG AV Testing Project Whitepaper.

Authored by Paul Wells

Physical test execution at GoMentum Station. Pedestrian scenario along Pearl St. in “Downtown” zone.

UPDATE: On August 20, 2020 UC Berkeley hosted a webinar on the research outlined in this article and tools leveraged in the work. Visit the YouTube link for an in-depth overview.

Simulation and physical testing are widely known to be complementary modes of automated vehicle verification & validation (V&V). While simulation excels in scalable testing of software modules, physical testing provides the ability for full-vehicle, everything-in-the-loop testing and “real life” evaluations of sensing stacks or downstream perception / sensor fusion layers. These tradeoffs, in part, contribute to the test distribution approach put forth by Safety First For Automated Driving (SaFAD).

Matrix that maps test modes to use cases

“Test Platform and Test Item”. Safety First for Automated Driving, Chapter 3: Verification & Validation (2019).

Due to the ability of simulation to scale rapidly, much work is currently underway to resolve its core limitation (fidelity). Developments in modeling (sensors, environment, vehicle dynamics, human drivers, etc.) and fast, photorealistic scene rendering all stand to expand the scope of validation exercises that can be performed virtually. Much less studied, however, are the means by which physical testing can improve upon its core limitation (efficiency). Whereas the speed of virtual testing allows for near constant re-submission and re-design of test plans, the physical world is much less forgiving. Being strategic about time spent on a track is therefore vital to maximize the efficiency of physical testing. Although many inputs into efficient physical testing are fixed (e.g. the inherent slowness in working with hardware), it is unclear whether the utility of physical test data also scales linearly alongside the volume of test execution. In other words, are the results of all physical tests equally valuable, or are there certain parameters within a given scenario which, if realized concretely at the track, would result in more valuable insights? If so, how might one discover these higher-value parameters using a virtual test environment?

These questions were central to our three-way research partnership between GoMentum Station (owned and operated by AAA Northern California), UC Berkeley VeHiCaL, and LG Silicon Valley Lab (LG SVL). Accordingly, we modeled a digital twin of GoMentum in the LG Simulator and leveraged an integration between Berkeley’s Scenic scenario description language (Scenic) and LG’s simulation platform. We elected a pedestrian crossing scenario and used the Scenic-LG integration to parameterize the scenario along several relevant vectors, executing nearly a thousand different virtual test cases. The results of one such test set were as follows, where rho is a proxy for the minimum distance between the ego vehicle and pedestrian target. Within this plot, we identified the clustering patterns along rho to be most interesting. As such, we elected eight cases for physical test execution: two failures (F), three successes (S), and three marginal (M) cases, where failure cases exhibited minimum distance values of less or equal to .4 meters. In short, the results from physical testing established safe/marginal/unsafe parity across test modes.

3-D plot depicting simulation test results

Each point represents a test case. X = pedestrian start delay (s), Y = pedestrian walk distance (m), Z = pedestrian hesitate time (s). Rho = proxy for minimum distance.

While the concept of correlating virtual and physical tests is not in itself novel, our results provide evidence to suggest that parametrization and analysis of virtual test results can be used to inform physical test plans. Specifically, the framework of recreating “low-rho” failure cases physically and within a reasonable degree of conformance to virtual runs allows for the capture of rich ground-truth data pertaining to Vehicle Under Test (VUT)-specific critical cases — all without the guesswork involved in manually tuning scenario parameters at the physical test site. Because we were able to tease out deficiencies of the AV stack only after running ten odd test cases, the utility of data captured relative to time spent onsite was significant. As compared to relatively unstructured physical test exercises or arbitrarily assigned scenario parameters, this framework represented an efficient means of both discovering and recreating physical critical cases.

Stepping outside this framework and the original research scope, our results also provide evidence of several challenges in validating highly automated vehicles (HAV). Even within our relatively low volume of tests, one test case (M2) produced a false negative when transitioning from virtual to physical testing — underscoring the importance of testing virtually and physically, as well as the difficulty in interpreting simulation results or using simulation results as a proxy for overall vehicle competence. We were also surprised by the exceptionally sensitive tolerances within each parameter. In certain cases the difference between a collision / no collision was a matter of milliseconds in pedestrian hesitation time, for instance. This underscores both the brittleness and, to a lesser extent, the non-determinism of machine learning algorithms — two of the broader challenges in developing AI systems for safety-critical applications.

Physical test execution at GoMentum Station. Pedestrian scenario along Pearl St. in “Urban Zone”.

Importantly, these challenges face industry and independent assessors alike. Current test protocols for agencies like EuroNCAP are very rigid. Vehicle speed, a primary test parameter in an AEB VRU test protocol for instance, varies along a step function with a range from ~20-60 kph and increments of 5kph. While perhaps suitable for L2 systems where drivers remain the primary line of defense, this approach clearly contradicts the parameter sensitivities exhibited above. If independent assessors hope to ascertain meaningful conclusions from the results of physical test exercises, these exercises will need to be highly contrived — i.e. not only will the scenarios need to be chosen according to a particular operational domain design (ODD), but the parameters used to concretize physical tests should in fact be assigned by inquiry — perhaps even VUT-specific inquiry — instead of by top-down, industry-wide mandate. This could necessitate an assessment framework built around coverage and safety case development rather than test-by-test scoring and comparison — an approach encouraged by work from groups like UL 4600 and Foretellix.

Many open questions remain in examining HAV test efficiency and overall HAV validation processes. We look forward to using the insights above in 2020 to delve deeper into research and to continue forging relationships with industry. We will be engaging in subsequent projects within the SAE IAMTS coalition to further explore both the toolchain and process used for correlating virtual and physical results.

This work also generated a number of outputs that we look forward to sharing with the industry. All videos from our track testing at GoMentum are available here, with recorded simulation runs here. The underlying datasets — both from ground truth at the track and outputs of the simulation — are also being used for further analysis and safety metric development.