Long-tail data curation in the times of the FAIR Principles and Enabling FAIR Data – challenges and best practices from GFZ Data Services
- GFZ German Research Centre for Geosciences, Data, Information and IT Services, Potsdam, Germany
Following the FAIR principles, research data should be Findable, Accessible, Interoperable and Reuseable. Publishing research output under these principles requires to generate machine-readable metadata and to use persistent identifiers for cross-linking with descriptive articles, related software for processing or physical samples that were used to derive the data. In addition, research data should be indexed with domain keywords to facilitate discovery. Software solutions are required that help scientists in generating metadata, since metadata models tend to be complex and the serialisation into a format for metadata dissemination is a difficult task, especially in the long-tail communities.
GFZ Data Services is a domain repository for geosciences data, hosted at GFZ German Research Centre for Geosciences, that assigns DOIs to data and scientific software since 2004. The repository has a focus on the curation of long-tail data but also provides DOI minting services for several global monitoring networks/observatories in geodesy and geophysics (e.g. INTERMAGNET, IAG Services ICGEM and IGETS, GEOFON) and collaborative projects (e.g. TERENO, EnMAP, GRACE, CHAMP). Furthermore, GFZ is allocating agent for IGSN, a globally unique persistent identifier for physical samples with discovery functionality of digital sample descriptions via the internet. GFZ Data Services will also contribute to the National Research Data Infrastructure Consortium for Earth System Sciences (NFDI4Earth) in Germany.
GFZ Data Services increases the interoperability of long-tail data by (1) the provision of comprehensive domain-specific data description via standardised and machine-readable metadata complemented with controlled “linked-data” domain vocabularies; (2) complementing the metadata with technical data descriptions or reports; and (3) embedding the research data in wider context by providing cross-references through Persistent Identifiers (DOI, IGSN, ORCID, Fundref) to related research products and people or institutions involved.
A key tool for metadata generation is the GFZ Metadata Editor that assists scientists to create metadata in different metadata schemas that are popular in the Earth sciences (ISO19115, NASA GCMD DIF, DataCite). Emphasis is placed on removing barriers, in particular the editor is publicly available on the internet without registration, a copy of the metadata can be saved to and loaded from the local hard disk and scientists are not requested to provide information that may be generated automatically. To improve usability, form fields are translated into the scientific language and we offer a facility to search structured vocabulary lists. In addition, multiple geospatial references can be entered via an interactive mapping tool, which helps to minimize problems with different conventions to provide latitudes and longitudes.
Visiblity of the data is established through registration of the metadata at DataCite and the dissemination of metadata in standard protocols. The DOI Landing Pages embed metadata in Schema.org to facilitate discovery through internet search engines like the Google Dataset Search. In addition, we feed links of data and related research products into Scholix, which allows to link data publications and scholarly literature, even when the data are published years after the article.
How to cite: Ulbricht, D., Elger, K., Radosavljevic, B., and Ott, F.: Long-tail data curation in the times of the FAIR Principles and Enabling FAIR Data – challenges and best practices from GFZ Data Services, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-16466, https://doi.org/10.5194/egusphere-egu2020-16466, 2020