Deep Dive Into AAA’s Latest Active Driving Assistance Report

GoMentum testing team leads closed-course evaluation of ADA systems

Authored by Atul Acharya

Today, AAA published the results of testing active driving assistance (ADA) functions available on several commercially available vehicles. The results show that critical ADA functions, namely Lane Keep Assist (LKA) and Adaptive Cruise Control (ACC) that drivers rely on, fall short of expectations. These ADA functions — categorized as SAE Level 2 (L2) automation — are a subcategory of the generally known advanced driver assistance systems (ADAS). 

Why

AAA’s Automotive Engineering Group regularly performs research that benefits AAA’s 60 million members and the general public; this research is executed on commercially available vehicles (not on research prototypes). Previous research examined important ADAS functions such as automatic emergency braking (AEB) technology, revealing shortcomings of various available systems. Additional research recommended renaming various ADAS functions, a position now endorsed by SAE, noting how the commercial names for ADAS functions have become too confusing for consumers. These studies are independent evaluations and aim to be objective in their methodology and findings.

In the same vein, the recently concluded research aimed to examine the limitations of lane keep assistance and adaptive cruise control acting as one system. This L2 feature forms a core of automation functionality as vehicles become more complex on their way towards full automation. As more auto OEMs launch more ADAS features, it is imperative that motorists and consumers get an unbiased view of their benefits and limitations. Thus, the aim of the project was to find limitations of active driving assistance and inform both consumers and OEMs on their performance with the aim of improving them.

Traffic jam simulation

Why GoMentum?

The latest research was led by AAA’s Automotive Engineering Group, in collaboration with AAA Northern California’s AV Testing team at GoMentum Station, and Automobile Club of Southern California’s Automotive Research Center. The test plan included two equally important parts: closed-course testing at GoMentum Station, and naturalistic driving on highways between Los Angeles and San Francisco. The tests were conducted over a period of a few weeks in late October and early November 2019. The work at GoMentum was led by Atul Acharya along with Paul Wells.

GoMentum Station was specifically chosen for closed-course testing as it is one of the premier sites for AV and ADAS testing, and includes features such as 1-mile long straight roads in the Bunker City test that include fresh lane markings, along with curved roads like Kinne Boulevard that has degraded lane markings. These features are ideally suited for testing Lane Keep Assistance functions that rely on lane markings, with the degraded lane markings offering an additional challenge to the vehicle’s sensors. Other areas of Bunker City were used to test Traffic Jam Assist (TJA) functionality, as well as testing the subject vehicle approaching a simulated disabled vehicle.

For closed-course testing, the key questions were:

  • How do vehicles with active driving assistance systems perform during scenarios commonly encountered in highway situations?
  • Specifically: 
    • How well does the lane keep assist system perform?
    • How does a vehicle perform in stop-and-go traffic?
    • How does a vehicle respond to disabled vehicle on the roadway?

Instrumentation

All vehicles were equipped with industry standard equipment such as:

  • OxTS RT 3000 – inertial measurement unit
  • OxTS RT-Range S hunter – for accurately tracking ranges to target vehicles
  • DEWEsoft CAN interfaces for reading CAN bus messages 
  • DEWEsoft CAM-120 cameras

Target vehicles were equipped with:

  • OxTS RT 3000 and OxTS RT-Range S

Lane survey equipment on 12th Street at GoMentum Station

Testing Methodology Overview

Lane Keep Assist Testing

Sustained lane-keeping functionality is one of the primary capabilities of active driving assistance. To test LKA, the roadway utilized must have visible lane markings. Prior to test, a lane survey was performed on GoMentum’s 12th street test zone, a straight, 1.2-mile roadway with clear and fresh lane markings. This roadway is ideal for high speed testing so that vehicles can be tested at various speeds. Using the same high-precision lane survey equipment from OxTS, a precise map of lane markings was created by walking the entire length of the road. The map is then used as an underlay when lane tests are performed.

During testing, the OxTS 3000 inertial measurement unit tracks the precise movement of the vehicle under test (VUT) as it moves along the road when LKA function is active. As part of configuration setup, a polygon is previously defined that marks the edge boundaries of the VUT. Range data is collected that determines precise lateral distances from the vehicle’s polygon boundaries (more specifically, from its leftmost and rightmost points)  to the nearest lane markings. All this data is captured at 100Hz, and then subsequently plotted. The charts show the vehicle’s lane centering position, as well as its distance to the right lane mark and the left lane mark. When charted appropriately, the data can show whether the VUT had any bias towards left/right placement when traveling in the lane.

Traffic Jam Assistance testing

Stop-and-go traffic situations are frequent while driving on highways. Nominally, adaptive cruise control (ACC) systems will “follow” a lead vehicle at a safe distance, accelerating automatically if the lead vehicle accelerates, and decelerating automatically if the lead vehicle decelerates. Of course, exactly what a “safe distance” is, and just how soon the vehicle accelerates or decelerates depends on the vehicle. Knowing the limits of these systems is important to motorists so that they are aware of potential risks. 

To test traffic-jam assistance, the team utilized a DRI Soft Car 360® on a Low Profile Robotic Vehicle (LPRV) platform. The DRI Soft Car 360® is a foam car that is mounted on the LPRV platform which itself can move at speeds of up to 50 mph. With the DRI Soft Car acting as a simulated “lead vehicle”, a vehicle under test (VUT) activates its ACC system (by reaching a certain speed, such as 30 mph) and then lets the ACC system follow the lead vehicle automatically. The lead vehicle is then programmed to accelerate for some time, which causes the VUT to accelerate while maintaining a safe distance. Similarly, the lead vehicle is then programmed to decelerate, which causes the VUT to decelerate. The lead vehicle once again accelerates, causing similar stop-and-go behavior in the VUT. At all times, the vehicles’ kinematic data is recorded in a data logger. The vehicles are subjected to varying levels of deceleration at 0.3g, 0.45g, 0.6g and three runs are performed for each VUT. The following distance, separation distance / time-to-collision at start of braking, speed differential at start of braking, average and max instantaneous deceleration are all recorded. When charted out, the data reveals the system performance. 

Simulated Disabled Vehicle approach testing

Driving on highways is often risky. AAA, the largest emergency road services (ERS) provider, alone handles over 30 million emergency road service requests nationwide every year. Encountering disabled vehicles on highways in a risky scenario for motorists. The team wanted to find how active driving assistance systems react when faced with such a situation. 

To create a disabled vehicle situation, the team created a simulated scenario with the DRI Soft Car 360 placed halfway on the roadway, with 50% of the soft car in the travel lane and the rest 50% on the right shoulder. A vehicle under test is then subjected to this situation and its ADA system reaction is noted. 

Results

So how did these vehicles perform? While active driving systems mostly worked the way they were designed, there were notable shortcomings in their performance when these systems were pushed to the limits. Consumers and motorists should always be vigilant and attentive when driving, and be ready to take over at a moment’s notice whenever these L2 automation systems are active. 

To learn more about the ADA L2 Testing, please download the full report.

If you are an AV or ADAS developer, or a technology vendor working on core components of automation, and would like to confidentially test the limits of your system, or to learn more about the ADA L2 Testing project, please get in touch with Atul Acharya or Paul Wells at: gomentum@norcal.aaa.com 

Top 5 features at GoMentum Station for testing AVs

 

At GoMentum Station, 2100 acres and 20+ miles of roadway are available to our customers to test their ADAS, autonomous and connected technologies, only 65 miles from Silicon Valley. Here are the top 5 features that make GoMentum Station the premiere testing destination for our customers:

#1: Bunker City has it all: multi-directional lanes, various striping patterns, traffic lights, bike and pedestrian infrastructure, and speeds from 15 to 55 mph. Set among hundreds of former munitions bunkers (hence the name), Bunker City proves challenging in many ways, from multipath navigation options to localization in a homogeneous environment.

 

#2: Downtown is our take on ‘Main Street’ America. From teams’ operational base at our WWII firehouse, vehicles can train on freshly paved streets that mimic a suburban environment. With up to four travel lanes, center turn lanes, on-street parking, bus stops and bike lanes, Downtown will put even the best passenger or delivery service vehicles to the test.

 

#3: 12th Street – Need a one mile straightaway that looks like every stretch of highway in America? We’ve got you covered. Teams use 12th street to test ADAS features such as lane change assist, adaptive cruise control and emergency braking.

 

 

#4: Tunnel Road – Want to challenge your communication and perception systems? GoMentum has two 600 foot tunnels built of two foot thick concrete encased in a sheet of corrugated metal. Stark light contrast at all hours of the day will blind both human eyes and cameras – will your vehicle see the pedestrian crossing at the end of the tunnel?

 

#5 Kinne Boulevard – Our longest stretch of road at 4.5 miles, Kinne Boulevard offers the opportunity to test continuous driving with various straightaways and curves.

To find our more about the unique features at GoMentum Station contact us to schedule a tour of the proving ground.

GoMentum Presents at SAE World Congress WCX 2020 Digital Summit

GoMentum Station presents two key papers at SAE WCX 2020, on testing automated driving systems, and on a novel collision avoidance safety metric.

We are excited to announce that research performed by AAA’s AV Testing team, in collaboration with two key partners, is being presented at the SAE World Congress 2020.

Of the many technical standards groups and industry conferences, SAE and World Congress stand apart. This year, in place of the Detroit event, we are excited to support SAE WCX virtually and showcase our research via the Digital Summit.

Both the papers, and their oral presentations, are available for on-demand viewing at the SAE WCX website in the Body / Chassis / Occupant and Pedestrian Safety Structure technical sessions category. 

Modes of Automated Driving System Scenario Testing: Experience Report and Recommendations (SAE Paper 2020-01-1204) 

This research, performed in collaboration with University of Waterloo’s Professor Krzysztof Czarnecki, Michal Antkiewicz and team at the Waterloo Intelligent Systems (WISE) lab, explores testing autonomous vehicles using four different modes including simulation, mixed reality testing, and test-track testing. The team tested UW’s automated vehicle, dubbed “UW Moose”, through six rigorous scenario tests in different modes, and compared and contrasted their benefits and drawbacks. The paper closes with 12 recommendations on choosing testing modes for automated driving systems.

The SAE paper 2020-01-1204 may be purchased here

Development of a Collision Avoidance Capability Metric (SAE Paper 2020-01-1207)

This research paper discusses the development and application of a novel metric for evaluating and quantifying the capability of a vehicle / controller (including a human driver) to avoid potential future collisions. The metric was developed in partnership with DRI, and is applicable to potentially any scenario, including with multiple actors and roadside objects. 

The SAE paper 2020-01-1207 may be purchased here

 

To discuss these, and other research at GoMentum, feel free to contact Atul Acharya or Paul Wells at gomentum@norcal.aaa.com 

Accelerating ADAS and AV Development with the GoMentum Digital Twin

Side-by-side of real world and virtual

GoMentum Station and Metamoto announce a new digital twin simulator to accelerate the development and testing of ADAS features and automated driving systems

Authored by Paul Wells

Applying autonomy to safety critical applications — most notably, autonomous vehicles — requires extensive verification and validation. The majority of this testing takes place in scalable simulation tools whereas much of validation and verification is still accomplished in the physical world. As such, correlation between these test modes remains an important consideration when evaluating and advancing the overall efficiency of test efforts.

Metamoto and GoMentum Station are excited to partner to offer the GoMentum Bunker City scene within Metamoto’s massively scalable, cloud-based simulator to help promote this connectivity. Using the GoMentum scene, developers are able to both drive efficiency and go deeper into specific validation subdomains. Contact us for more information about the digital twin.

Efficiency

Physical testing is notoriously resource intensive. To help drive costs down, developers and test engineers are faced with two key options: perform as much virtual testing as possible, and identify the smallest, but still significant, set of physical tests required to produce meaningful assurance and test results. Simulation, when integrated with physical test environments, is an incredibly powerful ally in both of these efforts.

 

Virtual bunkers

In addition to supporting the overall development and advancement of an autonomous stack, virtual testing within a digital twin environment effectively allows for a faster and more efficient bug discovery process before deploying hardware and drivers into a physical environment. This reduces the likelihood of time-consuming troubleshooting that pops up during the precious time spent either on a track or on public roads. GoMentum, for example, is an unusual environment. The use of the Metamoto scene and simulator allows for virtual exploration of the environment such that static objects in the environment, like bunkers and tall grasses, and dynamic agents, such as cows, turkeys and the like, do not disrupt time spent on the track. Turnaround times thereby become faster, and the cycle between virtual and physical testing continues efficiently.

Empty road

Parity Testing

An added benefit of leveraging digital twins is the potential for parity into the conformance of virtual and physical test results. Due to the increasing importance of, and reliance on, simulation testing, understanding the relationship between virtual and physical test results is critical. This exploration of parity testing allows for greater awareness of the relationship between sample use cases and model fidelities leveraged in simulation. While the goal of this awareness is principally safety, it also has the potential to drive even greater use of simulation tools.

As part of the new GoMentum Bunker City Digital Twin, GoMentum and Metamoto invite industry and academic participants to partake in research projects covering feature development, parity testing, safety testing, and more. 

Digital image

 

For more information, reach out to Gomentum@norcal.aaa.com and metamoto@metamoto.com.

Select experiences in testing AV perception systems 

GoMentum Station and AAA Northern California, Nevada & Utah partner with University of Waterloo Center for Automotive Research (WatCAR) to validate AV safety performance and compare results across virtual and physical scenario tests.

Authored by Paul Wells

A key challenge in validating autonomous vehicles — and, more broadly, safety-critical applications of AI — is the brittleness of machine learning (ML) based perception algorithms [1] [2]. This challenge is significant because errors in sensing & perception affect all downstream modules; if a system cannot properly detect its environment, it will have a diminished likelihood of completing its goal within this same environment. Although focused on full-vehicle testing via structured scenarios, our work with University of Waterloo highlighted this key issue. 

Our research subjected Waterloo’s automated research vehicle, Autonomoose , to several predefined scenarios: roundabout with road debris, traffic jam assist, standard pedestrian crossing, intersection pedestrian crossing, and stop sign occlusion. 

These scenarios were tested first in simulation using the University’s WiseSim platform and the GeoScenario SDL, then re-tested physically at a closed-course. Although intended as a broad exploration of the utility of controlled, physical testing for autonomous vehicle validation, this project nonetheless surfaced findings which — albeit specific to our given tech-stack — reinforce the otherwise well-documented challenges in validating autonomous systems.vehicle and test dummy

A few highlights of our experience paper, specifically as related to test result discrepancies due to the perception system, can be summarized as follows:

Simulation architecture in this case provided “perfect perception”. As such, our virtual tests assumed that all actors in were perceived and understood by the system, in this case a modified instance of Baidu’s Apollo. This assumption led to large discrepancies between virtual and physical tests, especially in scenarios containing pedestrians. In our case, once the vehicle was introduced to the physical environment, perception-related deficiencies resulted in a large number of inconsistent or completely missed detection. During the pedestrian crossing, for instance, the AV struggled with intermittent loss of pedestrian perception due to issues with the object detection module. In simulation, however, the pedestrian was readily detected. Our video at the top of the fold shows performance on the physical track, while the below shows behavior in simulation.

Sensor fidelity in simulation was limited. Further highlighting the importance of closely tracked sensor model fidelity, the modeled LIDAR beam pattern used in simulation did not match the real specs of the on-vehicle sensor. This issue was uncovered due to conflicting virtual-physical test results within a road debris scenario, designed to assess the vehicle’s ability to detect small, stationary objects. As described in our paper, “The scenario exposed some lack of fidelity in the simulated lidar which caused inconsistent road debris detection behavior in simulation compared to the closed course. The lidar mounted on the roof of the SV had a non-linear vertical beam distribution, whereby the beams were grouped more densely around the horizon, and were too sparse to reliably detect the debris on the ground. In contrast, the simulated lidar had a linear vertical beam distribution, i.e., the beams were spaced out evenly. Consequently, implementing the non-linear beam distribution in the simulated lidar resulted in SV behavior in simulation consistent with the SV behavior on the closed course.”  Pictured below, the Autonomoose fails to detect a cardboard box on the road during physical testing.

Environmental fidelity was limited. Finally, our use of WiseSim primarily involved a re-creation of the road network — but not the visual scene — present during closed-course testing. This introduced small complexities when Autonomoose was ultimately introduced into the physical track. Principally, uneven road slopes at the physical track created unexpected LiDAR returns and false detections onboard the vehicle. Because WiseSim did not recreate ground plane measurements, we encountered a bit of de-bugging at the track. This reiterates the need for close tracking of virtual model fidelity when using simulation to prepare for track tests and, more broadly, when performing correlations between virtual & physical tests.

Although these findings may not all generalize to the broader domain of AV testing, they nonetheless provide concrete instances of theoretical challenges facing the field. We will continue exploring these challenges with industry partners and sharing results. We also welcome inquiries about the scenarios used, technical assets created, and data sharing.

For the full paid SAE report visit their website to download.

For further info about the GoMentum testing team, please email us!

 

Understanding Unsettled Challenges in Autonomous Driving Systems

Recently published SAE Edge Reports, co-authored by test and research team at AAA Northern California, Nevada and Utah, highlights the key issues that the autonomous vehicles industry continues to face.

Authored by Atul Acharya

SAE EDGE Reports 2019
SAE EDGE Reports on Simulation and Balancing ADS Testing

The promise of highly automated vehicles (HAVs) has long been in reducing crashes, easing traffic congestion, ferrying passengers and delivering goods safely — all while providing more accessible and affordable mobility enabling new business models.

However, the development of these highly complex, safety-critical systems is fraught with extremely technical, and often obscure, challenges. These systems are typically comprised by four key modules, namely:

(i) the perception module, which understands the environment around the automated vehicle using sensors like cameras, lidars, and radars,

(ii) the prediction module, which predicts where all other dynamic actors and agents (such as pedestrians, bicyclists, vehicles, etc.) will be moving in the next 2-10 seconds,

(iii) the planning module, which plans the AV’s own path, taking into account the scene and dynamic constraints, and

(iv) the control module, which executes the trajectory by sending commands to the steering wheel and the motors.

If you are an AV developer working on automated driving systems (ADS), or perhaps even advanced driver assistance systems (ADAS), you are already using various tools to make your job easier. These tools include simulators of various kinds — such as scenario designers, test coverage analyzers, sensor models, vehicle models — and their simulated environments. These tools are critical in accelerating the development of ADS. However, it is equally important to understand the challenges and limitations in using, deploying and developing such tools for advancing the benefits of ADS.

We in the AV Testing team at GoMentum actively conduct research in AV validation, safety, and metrics. Recently, we had an opportunity to collaborate with leading industry and academic partners to highlight these key challenges. Organized by SAE International, and convened by Sven Beiker, PhD, founder of Silicon Valley Mobility, two workshops were organized in late 2019 to better understand various AV testing tools. The workshop participants included Robert Siedl (Motus Ventures), Chad Partridge (CEO, Metamoto), Prof. Krzysztof Czarnecki and Michał Antkewicz (both of University of Waterloo), Thomas Bock (Samsung), David Barry (Multek), Eric Paul Dennis (Center for Automotive Research), Cameron Gieda (AutonomouStuff), Peter-Nicholas Gronerth (fka), Qiang Hong (Center for Automotive Research), Stefan Merkl (TUV SUD America), John Suh (Hyundai CRADLE), and John Tintinalli (SAE International), along with AAA’s Atul Acharya and Paul Wells.

Simulation Challenges

One of the key challenges encountered in developing ADS is in developing accurate, realistic, reliable and predictable models for various sensors (cameras, lidars, radars), and actors and agents (such as vehicles of various types, pedestrians, etc.) and the world environment around them. These models are used for verification and validation (V&V) of advanced features. Balancing the model fidelity (“realism”) of key sensors and sub-systems while developing the product is a key challenge. The workshop addressed these important questions:

  1. How do we make sure simulation models (such as for sensors, vehicles, humans, environment) represent real-world counterparts and their behavior?
  2. What are the benefits of a universal simulation model interface and language, and how do we get to it?
  3. What characteristics and requirements apply to models at various levels, namely, sensors, sub-systems, vehicles, environments, and human drivers?

To learn more about these and related issues, check out the SAE EDGE Research Report EPR2019007:

https://www.sae.org/publications/technical-papers/content/epr2019007/

Balancing ADS Testing in Simulation Environments, Test Tracks, and Public Roads

If you are a more-than-curious consumer of AV news, you might be familiar with various AV developing companies stating proudly that they have “tested with millions or billions of miles” in simulation environments, or “millions of miles” on public roads. How do simulation miles translate into real-world miles? Which matters more? And why?

If you have thought of these questions, then the second SAE EDGE report might be of interest.

The second workshop focused on a broader theme: How should AV developers allocate their limited testing resources across different modes of testing, namely simulation environments, test tracks, and public roads? Given that each mode of testing has its own benefits and limitations, and can accelerate or hinder development of ADS accordingly, this question is of paramount importance if the balance of limited resources is askew.

This report seeks to address three most critical questions:

  1. What determines how to test an ADS?
  2. What is the current, optimal, and realistic balance of simulation testing and real-world testing?
  3. How can data be shared in the industry to encourage and optimize ADS development?

Additionally, it touches upon other challenges such as:

  • How might one compare virtual and real miles?
  • How often should vehicles (and their subsystems) be tested? And in what modes?
  • How might (repeat) testing be made more efficient?
  • How should companies share testing scenarios, data, methodologies, etc. across industry to obtain the maximum possible benefits?

To learn more about these and related challenges, check out the SAE EDGE Research Report EPR2019011:

https://www.sae.org/publications/technical-papers/content/epr2019011/

Physical Test Efficiency via Virtual Results Analysis

AAA Northern California, UC Berkeley and LG Silicon Valley Labs partner to examine the use of digital twin and parameterized testing to drive efficiency in closed-course testing. For a summary of the study download the AAA-UCB-LG AV Testing Project Whitepaper.

Authored by Paul Wells

Physical test execution at GoMentum Station. Pedestrian scenario along Pearl St. in “Downtown” zone.

Simulation and physical testing are widely known to be complementary modes of automated vehicle verification & validation (V&V). While simulation excels in scalable testing of software modules, physical testing provides the ability for full-vehicle, everything-in-the-loop testing and “real life” evaluations of sensing stacks or downstream perception / sensor fusion layers. These tradeoffs, in part, contribute to the test distribution approach put forth by Safety First For Automated Driving (SaFAD).

Matrix that maps test modes to use cases

“Test Platform and Test Item”. Safety First for Automated Driving, Chapter 3: Verification & Validation (2019).

Due to the ability of simulation to scale rapidly, much work is currently underway to resolve its core limitation (fidelity). Developments in modeling (sensors, environment, vehicle dynamics, human drivers, etc.) and fast, photorealistic scene rendering all stand to expand the scope of validation exercises that can be performed virtually. Much less studied, however, are the means by which physical testing can improve upon its core limitation (efficiency). Whereas the speed of virtual testing allows for near constant re-submission and re-design of test plans, the physical world is much less forgiving. Being strategic about time spent on a track is therefore vital to maximize the efficiency of physical testing. Although many inputs into efficient physical testing are fixed (e.g. the inherent slowness in working with hardware), it is unclear whether the utility of physical test data also scales linearly alongside the volume of test execution. In other words, are the results of all physical tests equally valuable, or are there certain parameters within a given scenario which, if realized concretely at the track, would result in more valuable insights? If so, how might one discover these higher-value parameters using a virtual test environment?

These questions were central to our three-way research partnership between GoMentum Station (owned and operated by AAA Northern California), UC Berkeley VeHiCaL, and LG Silicon Valley Lab (LG SVL). Accordingly, we modeled a digital twin of GoMentum in the LG Simulator and leveraged an integration between Berkeley’s Scenic scenario description language (Scenic) and LG’s simulation platform. We elected a pedestrian crossing scenario and used the Scenic-LG integration to parameterize the scenario along several relevant vectors, executing nearly a thousand different virtual test cases. The results of one such test set were as follows, where rho is a proxy for the minimum distance between the ego vehicle and pedestrian target. Within this plot, we identified the clustering patterns along rho to be most interesting. As such, we elected eight cases for physical test execution: two failures (F), three successes (S), and three marginal (M) cases, where failure cases exhibited minimum distance values of less or equal to .4 meters. In short, the results from physical testing established safe/marginal/unsafe parity across test modes.

3-D plot depicting simulation test results

Each point represents a test case. X = pedestrian start delay (s), Y = pedestrian walk distance (m), Z = pedestrian hesitate time (s). Rho = proxy for minimum distance.

While the concept of correlating virtual and physical tests is not in itself novel, our results provide evidence to suggest that parametrization and analysis of virtual test results can be used to inform physical test plans. Specifically, the framework of recreating “low-rho” failure cases physically and within a reasonable degree of conformance to virtual runs allows for the capture of rich ground-truth data pertaining to Vehicle Under Test (VUT)-specific critical cases — all without the guesswork involved in manually tuning scenario parameters at the physical test site. Because we were able to tease out deficiencies of the AV stack only after running ten odd test cases, the utility of data captured relative to time spent onsite was significant. As compared to relatively unstructured physical test exercises or arbitrarily assigned scenario parameters, this framework represented an efficient means of both discovering and recreating physical critical cases.

Stepping outside this framework and the original research scope, our results also provide evidence of several challenges in validating highly automated vehicles (HAV). Even within our relatively low volume of tests, one test case (M2) produced a false negative when transitioning from virtual to physical testing — underscoring the importance of testing virtually and physically, as well as the difficulty in interpreting simulation results or using simulation results as a proxy for overall vehicle competence. We were also surprised by the exceptionally sensitive tolerances within each parameter. In certain cases the difference between a collision / no collision was a matter of milliseconds in pedestrian hesitation time, for instance. This underscores both the brittleness and, to a lesser extent, the non-determinism of machine learning algorithms — two of the broader challenges in developing AI systems for safety-critical applications.

Physical test execution at GoMentum Station. Pedestrian scenario along Pearl St. in “Urban Zone”.

Importantly, these challenges face industry and independent assessors alike. Current test protocols for agencies like EuroNCAP are very rigid. Vehicle speed, a primary test parameter in an AEB VRU test protocol for instance, varies along a step function with a range from ~20-60 kph and increments of 5kph. While perhaps suitable for L2 systems where drivers remain the primary line of defense, this approach clearly contradicts the parameter sensitivities exhibited above. If independent assessors hope to ascertain meaningful conclusions from the results of physical test exercises, these exercises will need to be highly contrived — i.e. not only will the scenarios need to be chosen according to a particular operational domain design (ODD), but the parameters used to concretize physical tests should in fact be assigned by inquiry — perhaps even VUT-specific inquiry — instead of by top-down, industry-wide mandate. This could necessitate an assessment framework built around coverage and safety case development rather than test-by-test scoring and comparison — an approach encouraged by work from groups like UL 4600 and Foretellix.

Many open questions remain in examining HAV test efficiency and overall HAV validation processes. We look forward to using the insights above in 2020 to delve deeper into research and to continue forging relationships with industry. We will be engaging in subsequent projects within the SAE IAMTS coalition to further explore both the toolchain and process used for correlating virtual and physical results.

This work also generated a number of outputs that we look forward to sharing with the industry. All videos from our track testing at GoMentum are available here, with recorded simulation runs here. The underlying datasets — both from ground truth at the track and outputs of the simulation — are also being used for further analysis and safety metric development.

Introducing The GoMentum Blog

The Bay Area has been the epicenter of the self-driving vehicle industry since Waymo’s Chauffer project in 2013. Over the last seven-odd years the industry evolved tremendously; sixty-six companies are currently permitted to test vehicles on California public roads. For Waymo and startups alike, public road testing is but one environment in a broader testing regime. Simulation, of course, is another dominant mode – with industry leaders citing millions or billions of virtual miles driven. Similarly though, controlled testing on private roads is another piece of a holistic approach to validation, and one that has been employed in the auto industry for decades (GM first bought Milford in 1936).

When the AV industry began, Bay Area teams were equipped primarily with research vehicles and needed only parking lots for controlled, full-vehicle testing before moving to public road driving. Especially over the last two years, now with fleets of vehicles driving on public roads and a focus on scale, most of the industry has outgrown parking lots. GoMentum, a military base turned proving ground since 2014, serves as a more robust tool for the growing needs of validation teams. AAA Northern California, Nevada & Utah became formally involved in 2018 and has since invested into infrastructure and operations to support efficient use of the site. We’re proud to partner with multiple companies.

As the use of GoMentum as a development tool grew, we also began to think about other opportunities – beyond the physical facility – that could advance the industry’s understanding of safety. Guided by many questions that have crystalized in the field – most notably, how safe is safe enough? – GoMentum created a research agenda in 2019. This agenda sought to complement the work of ISO 21448, UL 4600, Pegasus, and Safety First for Automated Driving by focusing on three key areas: methods for full-vehicle validation, the relationship between physical and virtual testing, and safety metric development.

Having wrapped several of these projects, we’re now excited to share our results. Please stay tuned as we continue to release more about our work and offer insights into questions like: How can AV safety be measured? What are the challenges introduced when distributing validation across physical and virtual environments? What is the state of the current validation toolchain? Which requirements are important for vehicle safety? What needs to be done by legislators and policy makers to ensure the safety of the public?