Introduction

Over the past two decades, advancements in vehicle fleet data collection methodologies for research purposes emerged primarily from Naturalistic Driving Studies (NDS) and Field Operation Tests (FOT). These studies were driven by two main factors: the need to better understand the causal factors behind incidents and accidents, and the progressive innovations in various driver assistance systems, leveraging cost-effective sensor, communication, and data server technologies.

It became essential to define and document best practices for carrying out these extensive trials, leading to the development of FESTA Handbook in 2008. The handbook has since received several updates, the latest being version 8 (FESTA, 2021). It covers the full process of running field operational tests: from formulating research questions and preparing the test, to analysing collected data in order to answer these research questions. While extensive, FESTA cannot cover data management and data sharing aspects in high detail. Therefore, in 2016, the first version of this Data Sharing Framework was released, later updated in the CARTRE project (applying General Data Protection Regulation (GDPR)) and ARCADE project (adding automated driving topics) (DSF, 2021).

By the mid-2010s, the focus of research turned towards automated driving technology, fuelled by advances in machine learning and neural networks. There was a true hype in the latter part of the decade aiming at a fast introduction of automated vehicles and services on public roads. The challenges were however many and, even though there has been significant progress, much is still to be done. The shift also brought new dimensions to data sharing. Now, the domain grapples with large datasets that include both developmental sensor data and driving behaviour data to assess traffic changes. New regulations have also come into play, like test permits mandating minimal data collection.

Data is vital for the development of automated vehicles. Initially, NDS and FOT data was used for building driver models and establishing baseline for driver behaviour across various scenarios. Soon after, data suited for training machine learning (ML) models began to be shared, primarily for research purposes, leading to an increase in datasets that address different aspects of driving. Currently, projects are collecting data from vehicles working on test tracks or in confined areas or conducting tests on public roads limited by conditions specified in the Operational Design Domain (ODD) of the system. In the coming years, the aim is to extend the operational conditions for automated vehicles and initiate large-scale demonstrations or “living labs”. The data from these projects will be invaluable in validating systems and assessing their impact on traffic safety, efficiency, the environment, and society at large.

The EU, together with the strong European automotive industry, launched the Connected, Cooperative and Automated Mobility (CCAM) partnership (https://www.ccam.eu/) in 2020 to address the needs, requirements, and strategies of automated driving. Starting in 2022, numerous projects under this programme (being part of Horizon Europe), as well as many national projects, have been advancing research in automated driving, with vehicles being tested on public roads.

There are numerous challenges being addressed and data are a central element to overcoming them. No single entity can tackle these challenges by themselves – none possess the comprehensive data required to develop and validate the functions, systems, and vehicles. This framework introduces best practices for data exchange to facilitate and increase research data sharing.