5.10 References to accident databases
Accident data are a special type of data, related and connected to CCAM/FOT/NDS data as a collection of special situations which usually form a very small (but highly interesting in the safety context) subset of data, and are widely used globally. They are discussed here as a case study.
There are several projects world-wide that collect and protect accident data for scientific analysis. The context of these projects is differing and very mixed. Partners range from governmental institutions to universities and companies. With this great variety of users, there is a need for effective data protection. Interestingly, the actors even within one accident data project could be located in several countries and form different types of legal organisations.
Moreover, accident data projects are long-term, so the process of anonymization is crucial for their survival. There is always a chance that persons involved in an accident may ask for data related to their case. Safety-related data, especially accident data (which imply legal aspects), need more care than non-safety related data (e.g., data for driver behaviour analysis) when collected in a scientific context and thus are a good testbed for data protection. The level of anonymization is largely independent from the level of sharing the data, it can even be accident data collected and stored only by one OEM. It is a matter of the legal requirements that must be applied at the location of the Data Provider.
When data are anonymised, the link between a dataset and a specific person, accident or geographic location is cut. Then, the data can be used for a scientific purpose, but you cannot use it anymore in the context of legal affairs. Anonymization is crucial, as those who are responsible for data protection would stall the project without it.
Technically, accident data is protected by the Data Provider, who removes any details which can be directly connected to a single accident, or a person involved in that accident, before entering the data into the database. In particular, participants’ identities, exact geographic locations and exact dates are removed. Usually, pictures are also included in the data, necessitating a more complex process of anonymization: e.g., faces and company logos (e.g., printed on vehicles) must be blurred to make them unreadable, which yet cannot be done fully automatically, and manual intervention is required.
An interesting challenge arises when there is a need to link third party data to the already anonymised accident data supplied by the Data Provider. It would be useful, for example, to know the equipped safety features of a car involved in an accident, to analyse their effectiveness. However, direct access to the equipment information for a single vehicle (other than standard equipment, which can be determined by make, model and year) require the vehicle identification number (VIN) of the vehicle, which is not usually available to the Data Provider.
One solution is to provide a list of VINs, without any accident data information, to the third party. But as these VINs identify vehicles known to have been involved in accidents, this solution is not compliant with common data-protection requirements. In fact, to date this problem has only been solved in a closed environment (like an OEM). However, hosting data in a closed environment also needs to honour the legal restrictions which are valid at the Data Providers location. This differs between legal systems and the type of personal data stored, e.g., names and other details must be removed from medical data and faces on pictures must be blurred. In this example of information linkage, it must be considered that the VIN only points to the owner of a vehicle, not directly to the persons involved in the accident. This example shows how important data protection is, and how seriously it is handled in current scientific accident databases. This situation is not necessarily restricted to accident data and should be considered in other domains, too.
In some legal and political constellations, an increased level of data protection has to be practiced. Such constellations can occur in mixed environments, when public and private institutions run a joint project. The Data Provider has to meet certain additional requirements: for example, it has to be a server at a university, and the anonymised data are transferred over secured lines to the Data Consumer.
There is some variance in data-protection requirements around the globe. For example, in the US, accident data collected by the government are made public and can be downloaded from websites. Access is regulated by the US Federal Research Public Access Act (FRPAA) and the US Fair Access to Science and Technology Research Act (FASTR). The main reasoning behind public access is the social benefit from publicly funded research to all taxpayers which on the other hand is opposed to the protection of each individual’s data in the case of accident data. There is no other country with similar regulations worldwide. When data is published, it is highly important to remove/hide personal information. It should be noted that the anonymization level of US accident data is about the same as that of non-public databases in Europe, including blurred pictures and cut-off vehicle identification numbers.
In practice, data protection has proven for decades to be feasible when dealing with accident data in a scientific context. In a CCAM context there are different legislation and requirements on reporting accidents. In the US, many states investigate any accident for AV prototype vehicles. Some countries do as well in Europe. However, the protocol for collecting the information is not standardized, but there are initiatives looking into this.