4.3.5 Metadata in data catalogues

In online data catalogues, metadata plays a critical role in enabling users to discover and access data. A well-designed metadata schema can facilitate the search and discovery of datasets by providing standardized, consistent, and structured information about the data.

Fragmented metadata practices hinder data sharing and reuse of data. Different organizations use different metadata formats, which can result in inconsistencies and incompatibilities between datasets. Therefore, there is a need for a consistent and widely-accepted metadata format(s) across transport and CCAM tests, which will improve the sharing and reuse of data. The metadata format(s) should be flexible enough to adapt to evolving technology, allowing the addition of new fields as the technology advances.

Field-specific metadata can provide a deeper level of integration with existing data services and generic catalogues. The standardization of metadata improves discoverability and interoperability between datasets, enabling users to make meaningful connections across different fields.

For FOTs/NDSs, FOT-Net data implemented a data catalogue structure (https://cordis.europa.eu/docs/projects/cnect/3/610453/080/deliverables/001-FOTNetDataD41DataCataloguev3.pdf). To search effectively for CCAM datasets in an online catalogue, it is essential to document specific values. These include:

vehicle types (passenger cars, industrial vehicles, public transport, other)
tested system (automated driving, integrated driver support, aftermarket, other)
number of vehicles and test subjects
data log contents (naturalistic driving, fixed routes, raw sensor data, processed surrounding objects, accurate positioning)
public anonymous sample data available
logs contain driving in/during (urban, rural, hilly, snowy, heavy rainfall, fog, night, traffic jam).

In addition to these searchable keywords and values, it is important to include a summary of the dataset in the online data catalogue. This should include the title of the dataset, the DOI and URL where the dataset can be accessed, a short public description, the test start and end dates, the country where the test was conducted, and the main coordinate. The online data catalogue should also include basic administrative metadata such as the publisher and contact persons, access requirements, and documentation language. Finally, key structural metadata such as the log file format should also be included to provide users with a more complete understanding of the dataset. A metadata documentation template is provided as an Annex I of this document. By following this template, test leaders can ensure that their CCAM datasets are easier to re-use, and that valuable research data is not lost due to incomplete metadata practices.