8.1 Data management costs
In terms of cost items, data management has many things in common with open data efforts and large-scale user tests in various scientific disciplines. Documentation and user support have heightened roles, though, as the datasets are generally in non-standard form and have their origins in studies with specific goals. In addition, strict requirements to uphold user privacy and product IPR may require secure facilities and processes, raising the management costs of such datasets higher than those of fully open datasets.
Table 18 lists the items requiring funding in data management within this domain. The items are related to data management after a project – or more generally, after the data collection has ended.
Clearly, storing a massive dataset and organising proper backups to avoid losing data incurs costs. Data may also have to be anonymised to enable wider sharing. When sharing a dataset, licences/agreements usually need to be completed, as well as financial arrangements. Further, to justify the benefits of data sharing to funding organisations, it is important to collect information on the use of the data. As a result of such requirements, the list of data management cost items can grow long. However, that does not necessarily mean that data sharing causes a huge burden on organisations. Effective processes, support and tools provided internally or externally by professionals, can reduce the stress on participants in single projects. Basic preparations to share data should become part of good scientific practise.
Considering the general costs of data management, it is unlikely that all test data can be stored for future science. A selection process is foreseen that would concentrate the efforts and funding on promising and valuable datasets. This selection could be carried out by those who fund the costs of data sharing and supported by the experts who collected the data for the original project. The selection could be based on the following criteria:
- potential for re-use, from both scientific and business perspectives
- efforts needed to store the dataset
- quality and amount of data.
Table 18 presents cost items and tasks involved in data sharing after data collection has ended. It is assumed that tasks enabling data sharing, such as concluding legal agreements, metadata documentation and data quality checking, have already been performed in the original project that collected the data. Some of the cost items in Table 18 are optional, such as advertising datasets or participation in international harmonisation/standardisation efforts of data collection and data catalogues. However, such tasks are common in professional data management services and can also be foreseen in data-sharing activities that have achieved an established status.
Table 18: Data-sharing costs
Cost item | Comments | Timing of cost |
Data selection, enhancement of documentation (metadata), creation of entries in relevant data catalogues | Finalisation and structuring of data. As a pre-requisite for sharing, the datasets need to be comprehensively documented. | When project/data collection ends |
Anonymizing data | The level of anonymization and related efforts depend on how widely the data will be shared. | Before data is shared |
Management & coordination personnel costs | Basic management of e-infrastructure, including user support, data catalogue operations and updates, data import to archives, backups, compilation of usage statistics, license management, agreements and finances | Continuous |
IT operations | Database servers, storage, protection, licenses and IT personnel costs | Continuous |
Cyber security | Protect the system from cyber threats | Continuous |
Analysis or data handling facilities | Physically secure workspace | Continuous |
Analysis support services | Expert support at different levels | When data is shared and during analysis efforts |
Promotion and advertisement | Informing potential data re-users and data-sharing funders Optional: Direct funding of further analysis projects, to ensure good use of valuable datasets Optional: Direct advertisement of datasets for potential research projects and those planning new projects, beyond common catalogues | When project ends/Continuous |
Optional: Standardisation and collaboration regarding dataset formats | Taking part in national and international collaborations regarding dataset formats | Continuous |