2.2 Why share data

Data sharing and re-use form an integral part of the European Strategy for Data (https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A52020DC0066). The Open Data Directive (EU) 2019/1024 (http://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:31995L0046&from=en) sets guidelines for using data, including research data. If the data supplier also does research, sharing can lead to more collaboration and analysis. Sharing data can spark more research projects and raise chances of getting research funds.

Data sharing is essential for partners within a research project (a necessity), funding bodies, and organizations that focus on data exchange. There are today organisations openly sharing data at no cost, however, often limited to research purposes. This is common in AI and machine learning training datasets. They might share due to funding rules, to gain scientific prestige, or to showcase their technology.

Data providers invest significantly in collecting data and developing the related infrastructure and tools. Maintaining, assessing the quality, and sharing the data requires specialized staff to bring the data and tools to a level where they are easy to use. It is therefore essential to understand how to compensate data providers for their efforts. Providing some benefit would also increase the number of data providers willing to share their datasets.

The data provider often performs research on their own. Collaborations involving further analysis may create new funding opportunities and stimulate a large variety of research projects.

The original project collecting the data may only perform a narrow analysis based on their project’s research objectives. From a funding organisation´s perspective, utilising the existing datasets for further analysis is an efficient return on investment. For project partners who already know the data, being able to further use it in new projects is good payback on invested efforts. During this additional phase of data reuse, the funding organisation could require that additional partners are brought in, to expand the data’s reach.

With the vast data coming from around the world, combining datasets can offer more reliable results than using just one. Studying specific groups, like older drivers, in various countries can reveal how traffic behaviour varies culturally. If extra research funds are tied to global partnerships and data sharing, it boosts the worldwide research community. Collaborating on research builds trust, encouraging more data sharing and broadening knowledge.

These are some of the general advantages of sharing and re-using datasets. It is important, though, to identify the special circumstances that create a win-win situation between the data provider and the researcher in each specific case. At the same time, we must protect the information of those who participate in the research.

Even though these incentives have been clear for some time, data owners and providers have hesitated to share. The risks, highlighted in the European Data Strategy’s section on B2B data-sharing challenges, have also affected research datasets. To tackle this, the recent European approaches emphasize ‘federated data sharing’, a system enabling decentralized data control, and this approach is standardized by the GAIA-X initiative (https://gaia-x.eu/). This method is introduced in this document as an alternative to traditional data exchange and remote data access. It offers data owners and providers greater control over their data and its utilization.