- 1Alfred-Wegener-Institut Helmholtz-Zentrum für Polar- und Meeresforschung, Computing and Data Center, Bremerhaven, Germany (claudia.mueller@awi.de)
- 2Alfred-Wegener-Institut Helmholtz-Zentrum für Polar- und Meeresforschung, Computing and Data Center, Bremerhaven, Germany (stephan.frickenhaus@awi.de)
The NFDI4Earth (National Research Data Infrastructure for Earth System Sciences) initiative focuses on creating a sustainable research data infrastructure for Earth System Science (ESS) by aligning with the FAIR principles (Findable, Accessible, Interoperable, and Reusable). As part of the NFDI4Earth project, 150 research data repositories with highly diverse subject areas were collected to build the NFDI4Earth Service Catalogue. The consequence of diversity is that small and niche repositories lack visibility beyond their community. As researchers in ESS often cooperate on a multidisciplinary basis, there is also a need for discovering multidisciplinary repositories. Repository metadata is subject to constant change and needs to be individually and manually updated. Therefore, the need for a platform as a sustainable already existing data source was identified that can be harvested by NFDI4Earth. With re3data (Registry of Research Data Repositories), a global registry of research data repositories, such a platform is already available. Re3data operates now for more than a decade, is maintained through an international scientific network, and describes repositories with well-defined metadata fields. Nevertheless, we identified metadata fields with the potential to represent added value. These metadata fields include the “geographical extent of data”, the “maximum data upload size” and whether the repository is “unrestricted to external upload” (in contrast to hosting data from the maintaining institution only, or is restricted to project data). In a first approach, “maximum data upload size” and “unrestricted to external upload” were identified as metadata fields that would best add value, and characterize repositories further. In ESS scientists often deal with big data, such as in Remote Sensing and Satellite Imagery, Climate Modeling, or Environmental Monitoring, so that such information is important for guidance.
When we started this study, our hypothesis was that data repositories are unrestricted for any data upload, however, that there is a maximum data upload. Yet, our study showed that 70% of the repositories we evaluated in one or other way are restricted in their upload of data (in respect to a specific region, e.g. Australia Ocean Data Network Portal; topic, e.g. GEOFON; or institute affiliation, e.g. Geoportal BGR), and just 10% are unrestricted to the kind of data, but have a maximum upload size. We also identified 20% of the evaluated repositories as unrestricted to the kind of data and with no maximum upload size (e.g. PANGAEA). One result of this study is therefore that repositories have to be differentiated by their upload characteristic “restricted” and “unrestricted” in respect to the kind and size of the data. Highlighting this characteristic more clearly in the future should make it easier for users to distinguish multidisciplinary repositories from niche repositories. The next steps will be to discuss these findings with the wider community in ESS, identify further valuable metadata fields – some of which are domain-specific metadata fields - and progress the inclusion of these fields in re3data.
How to cite: Müller, C. and Frickenhaus, S.: Characterizing the Diversity of Data Repositories in ESS, and the Role of re3data, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-9589, https://doi.org/10.5194/egusphere-egu25-9589, 2025.