7.1 Support services
Support starts as early as the application stage, with discussions on the suitability of the data to answer the specific research questions at hand. Support services target the researcher’s ability to perform analysis and re-use existing data. The services are divided into different stages depending on the degree and impact of the support. These stages are:
- information and data provision
- supporting tools
- assistance with dedicated research needs
- data-protection and analysis facilities
- long-term preservation and data-sharing services.
Information and data provision
The first stage of support is to make researchers aware of available datasets and tools for data handling. This information is usually provided in online data catalogues. Furthermore, discussions may be necessary to answer questions about data usability (based on feedback from initial data analysis or from already performed data re-use) and which procedures have been established and proved to be successful. Metadata and other detailed background information on the data collection and initial study design can provide a better understanding of the dataset and improve data handling. Additional services, such as basic data aggregation and data extraction and transfer, could also be provided.
Supporting tools
Tools are an integral part of the support services. These tools consist of viewing and annotation tools, scripts to extract useful datasets from a database and licensed SW – and can also include entire frameworks for retrieving, processing, and uploading data back into a database. However, it is important that the analysts are free to choose what tools to use without being constrained by factors other than the raw data formats and data descriptions (for example, by complex frameworks with graphical interfaces). It is, as mentioned in chapter 4, important that raw data can be read in a clearly described format directly from the data storage source (e.g., database or file storage), regardless of what analysis tools are used in the project. Note that appropriate access restrictions should always apply. Allowing analysts to choose their tools is important, since different analysts have different ways of analysing data. Support services should impose as few constraints as possible on what processes analysts can use to analyse the data (within the data-protection framework). Examples of different ways to structure data and metadata are given in chapter 4. In addition, data and metadata formats will have to be able to support different analysis processes and needs, to be accepted and used by as large a community as possible. It is also important that the dependency on third-party software for access is kept to a minimum.
Support may consist of providing dedicated tools for specific tasks (if available) and setup and basic maintenance of the analysis tools. Due to the complexity of data analysis, the setup of these tools requires a profound understanding of the datasets. Further developments of the tools fall under the stage Assistance with dedicated research needs of the support services.
Assistance with dedicated research needs
Assistance, the most advanced stage of support services, can take the form of dedicated advice on analysis methods and the custom modification of tools. In a strict sense, analysis methods are not applied (this would be part of research services, see 7.2); instead, this service selects, provides, and adjusts analysis methods.
Data protection and analysis facilities
The following support services can also be provided:
- analyst training
- support relating to privacy issues
- data-protection measures
- secure facilities for analysis work
The researcher could be given training in security and privacy matters, thus gaining a deeper understanding of the sensitivity of the data. Training in using analysis tools could also be included (see chapter 6).
Support for new research projects on confidentiality and privacy issues is a common role for data warehouses.
Advice and support could be given on the need for data-protection measures.
Certain data warehouses offer secure sites/rooms for analysis. In these cases, the data may not be transferred, but must be analysed on-site to fulfil security requirements.
Long-term preservation and data-sharing services
Support services for long-term data preservation, access and reuse feed into the next cycle of data re-use, lengthening the lifespan of data into future projects.
For data to retain its value it needs to be preserved during or at the end of the data-collection/research project during which it is created. Long-term preservation requires a dedicated infrastructure and human resources, as well as planning and preparatory work by the data creators. This ensures that data preserved this way are properly formatted and documented, and that their future management is planned for.
Another aspect is access and findability. For data to have future value it needs not only to be properly preserved, but also to be found and accessed. Users need to be able to able to search for relevant datasets and evaluate them with the help of metadata and documentation. There also needs to be information about how to obtain access to the data.
The above functions can be fulfilled in different ways – by a centralized solution such as a data repository which preserves and makes data findable and accessible, or by a decentralized solution such as a dataspace, a data ecosystem where trusted partners host and share their own data according to agreed-upon data storage, access, sharing and interoperability standards.
For the framework to be adhered to, some data expertise and resources need to be dedicated to preparing and managing data. Three commonly used roles within research data support are Data Steward, Data Curator, and Data Manager. These roles can overlap one another and be fulfilled in different ways. There are also other data-related roles which either overlap or are equivalent to one or more of the above roles: data librarian, data custodian, RDM (Research Data Management) specialist, research engineer, etc. The three main roles are described below:
Data steward: a governance and compliance role responsible for ensuring that data processes and usage is in line with organizational policies and regulations, certifications, legal and ethical requirements, and IT-security requirements, all with the aim of producing FAIR data. Data Stewards can be responsible for overseeing data documentation, curation, and structure across an organization, with FAIR and long-term preservation as the outcome. Reproducibility and reusability are key concepts here. A Data steward can be generally available to an organization or embedded within a project or infrastructure.
Data manager: a more operational role responsible for overseeing the lifecycle of data within an organization or project. Data managers work with data storage, retrieval, processing, and possibly analysis of data. They may be responsible for designing and implementing data management systems and processes, and for ensuring that data is accessible and usable by those who need it. Data curator: ensures that data is accurate, relevant, and useful, and may be responsible for acquiring data from various sources, such as databases, websites, and other data repositories, and for ensuring that the data is properly formatted and organized. Data curators also work to ensure that data has proper metadata and documentation to aid findability and reusability, as well as maintaining data, metadata, and documentation over time.