5.2 Data classification
The level of data protection required depends on the harm the data could do if revealed and the legal requirements. If the dataset consists of personal or confidential commercial data, it is mandated by law that action is taken to ensure data protection, regardless of the size of the dataset. Confidential commercial data is usually accompanied by agreements stating the conditions for access and use, whereas the use of personal data is regulated by law and the agreement with the participant (via consent). This document classifies data into personal, special categories of personal data, confidential, and licensed data.
Personal data
GDPR (http://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32016R0679&from=EN) reformed the usage of personal data in Europe when the regulation came into force on May 25th, 2018. The GDPR strengthens the rights of individuals and sets a common legal framework for all European Union countries. Any organisation that handles or processes personal data in the European Union must ensure that personal data are managed according to the law. GDPR states in Art. 3 that the law applies also to processing of personal data monitoring of person behaviour taking place within European Union, regardless of the processing being done within or outside of European Union. Any organisation planning to share personal data to third countries outside of European Union must pay great attention to what can be shared and how.
Even though GDPR harmonizes the regulations in a European Union context, there will still be differences in implementation between the US, Australia and Asian countries. For example, in the US, ‘personal data’ are known as ‘personal identifiable information’ (PII) and ‘specific categories of personal data’ are known as ‘sensitive personal information’ (SPI or SPII). The definitions are not identical to the ones being stated in Europe, and it is therefore advised to take any necessary actions to ensure that data are managed according to the laws of the country where the data are located.
The term personal data is defined in GDRP Art. 4:
‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person
There are also special categories of personal data that requires additional consideration defined in GDPR Art. 9:
Processing of personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, and the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a natural person’s sex life or sexual orientation shall be prohibited.
The suggested data-protection requirements in this chapter aim to guide the Data Providers and Data Consumers in setting up a data-protection concept that meets the regulations and respects the will of the participants as stated in the consent form.
Confidential commercial and licensed data
Confidential commercial data is information which an organisation has taken steps to protect from disclosure, because disclosure might help a competitor. The sensitivity of confidential commercial data usually dictates the data-protection requirements stated in the data-sharing agreements. When contracts for providing the data are being signed, it is advisable for both parties to discuss, and agree on, the level of protection level that will be suitable.
Some data might be less sensitive and are here defined as commercial data. There is no exact definition but is in the control of the Data Owner to classify.
Any confidential commercial data should be shared in accordance with a license which describes the rights for handling specific signals. These signals can be organized in different sub-categories (from the perspective of where data was generated) where each might need a specific data protection implementation. These sub-categories could include 1) data generated by a driver or passenger, 2) data available on ODB2, 3) data from specific systems indicating state, instructions/recommendations, or actuations, 4) data from a third-party system or sensor provider, and 5) external data.
The questions that could be asked are:
- Who / what is generating the data (a vehicle, a person, a sensor, post-process)?
- Are any data merged / processed with other datasets (e.g., measurements from a GPS are merged with the content in a map database)?
- Are there any contracts/licenses that restrict the usage?
- Can post-processing the data reduce the sensitivity (e.g., aggregating data)?
Several considerations affect the usage and protection level of the data. It could be relevant to divide the different categories logically to make it more obvious which data is being in use. This should be seen in relation to the effort needed for an analyst in using the data efficiently. It is important to have both technical and organisational measures in place to comply with the data protection requirements agreed upon in the respective contracts.
There could be mix of the different categories within the same dataset. Depending on the classification multiple data protection requirements may apply.