ESSI3.3

The evolving Open and FAIR ecosystem for Solid Earth and Environmental sciences: challenges, opportunities, and other adventures

ESSI3.3

The evolving Open and FAIR ecosystem for Solid Earth and Environmental sciences: challenges, opportunities, and other adventures

Co-sponsored by AGU

Convener: Florian Haslinger | Co-conveners: Kirsten Elger, Shelley Stall, Katrin Seemeyer, Kristin Vanderbilt

vPICO presentations

| Tue, 27 Apr, 13:30–15:00 (CEST)

vPICO presentations: Tue, 27 Apr

Chairpersons: Florian Haslinger, Katrin Seemeyer, Kirsten Elger

13:30–13:35

5-minute convener introduction

13:35–13:40

EGU21-13323

solicited

Highlight

The Imperative of Open, Shared, Trusted (FAIR) Scientific Data: Accelerating for the Future

Brooks Hanson

The major societal challenges—ensuring a sustainable planet and ecosystems, with food, energy, water, health, and quality of life provided equitably—depend on convergent science grounded in the Earth and space sciences and broadly open, shared, and trusted (e.g., FAIR) data. Such data already provide enormous benefits (e.g, weather prediction; hazards avoidance and mitigation; precision navigation). In addition to being needed for these solutions, the integrity and trust in science and thus the solutions follows directly from open FAIR data. But many barriers hinder widespread practices and adoption. A number of concerned stakeholders are working on the technology and practices needed for FAIR workflows, and thanks to these efforts, the technical pieces for solutions are mostly in place. But a larger coordinated effort is needed around in particular (i) supporting the infrastructure needed globally, and (ii) developing the research culture and practices needed for universal FAIR data. The first challenge includes recognizing that science is now international and thus international FAIR data culture is essential. This requires greater urgent attention by the larger science stakeholders: societies, universities and research institutions, funders, and governments.

How to cite: Hanson, B.: The Imperative of Open, Shared, Trusted (FAIR) Scientific Data: Accelerating for the Future, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-13323, https://doi.org/10.5194/egusphere-egu21-13323, 2021.

13:40–13:45

EGU21-8052

solicited

Highlight

Advancing the FAIRness and Openness of Earth system science in Europe

Andreas Petzold, Ari Asmi, Katrin Seemeyer, Angeliki Adamaki, Alex Vermeulen, Daniele Bailo, Keith Jeffery, Helen Glaves, Zhiming Zhao, Markus Stocker, and Margareta Hellström

Focused environmental research projects and continuously operating research infrastructures (RIs) designed for monitoring all subdomains of the Earth system contribute to global observing systems and serve as crucial information sources for environmental scientists in their quest for understanding and interpreting the complex Earth System and contribute to global observing systems. The EU funded ENVRI-FAIR project [1] builds on the Environmental Research Infrastructure (ENVRI) community that includes principal European producers and providers of environmental research data and services.

ENVRI-FAIR targets the development and implementation of both technical frameworks and policy solutions that make subdomain boundaries irrelevant for environmental scientists and prepare Earth system science for the new Open Science paradigm. Cross-discipline harmonization and standardization activities, together with the implementation of joint data management and access structures at the RI level, facilitate the strategic coordination of observation systems required for truly interdisciplinary science. ENVRI-FAIR will ultimately create the open access ENVRI-Hub delivering environmental data and services provided by the contributing environmental RIs.

The architecture and functionalities of the ENVRI-Hub are driven by the applications, use cases and user needs, and will be based on three main pillars: (1) the ENVRI Knowledge Base as the human interface to the ENVRI ecosystem; (2) the ENVRI Catalogue as the machine-actionable interface to the ENVRI ecosystem; and (3) subdomain and cross-domain use cases as demonstrators for the capabilities of service provision among ENVRIs and across Science Clusters. The architecture is designed in anticipation of interoperation with the European Open Science Cloud (EOSC) and is intended to act as a key platform for users and developers planning to include ENVRI services in their workflows.

The ENVRI community objectives of sharing FAIRness experience, technologies and training as well as research products and services will be realized by means of the ENVRI-Hub. The architecture, design features, technology developments and associated policies will highlight this example of how ENVRI-FAIR is promoting FAIRness, openness and multidisciplinarity of an entire scientific area by joint developments and implementation efforts.

Acknowledgment: ENVRI-FAIR has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 824068.

[1] Petzold, A., Asmi, A., Vermeulen, A., Pappalardo, G., Bailo, D., Schaap, D., Glaves, H. M., Bundke, U., and Zhao, Z.: ENVRI-FAIR - Interoperable environmental FAIR data and services for society, innovation and research, 15th IEEE International Conference on eScience 2019, 1-4, doi: http://doi.org/10.1109/eScience.2019.00038, 2019.

How to cite: Petzold, A., Asmi, A., Seemeyer, K., Adamaki, A., Vermeulen, A., Bailo, D., Jeffery, K., Glaves, H., Zhao, Z., Stocker, M., and Hellström, M.: Advancing the FAIRness and Openness of Earth system science in Europe, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-8052, https://doi.org/10.5194/egusphere-egu21-8052, 2021.

13:45–13:47

EGU21-4737

IPCC Data Distribution Centre: FAIR data from Climate Research to Mitigation Policy

Martin Juckes, Martina Stockhause, Robert S Chen, and Xiaoshi Xing

The Data Distribution Centre (DDC) of the Intergovernmental Panel on Climate Change provides a range of services to support the IPCC Assessment Process. The role of the DDC has evolved considerably since it was established in 1997, responding to the expanding range and complexity of the data products involved in the IPCC assessment process. The role of the IPCC assessments has also evolved from considering whether anthropomorphic climate change might have unwelcome consequences and how those consequences would vary under different socio-economic scenarios to reporting on the likely outcome of different global policy options.

The DDC works both with datasets which underpin the key conclusions from the assessment and, increasingly, with data products generated by the scientists engaged in the assessment.

Applying FAIR data principles to data products being produced in the highly constrained context of the assessment process brings many challenges. Working with the Technical Support Units of the IPCC Working Groups and the IPCC Task Group, the IPCC DDC has helped to create a process that not only captures information needed to document data products but supports the consistent and clear description of figures and tables within the report.

How to cite: Juckes, M., Stockhause, M., Chen, R. S., and Xing, X.: IPCC Data Distribution Centre: FAIR data from Climate Research to Mitigation Policy, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-4737, https://doi.org/10.5194/egusphere-egu21-4737, 2021.

13:47–13:49

EGU21-12560

Applying FAIRness evaluation approaches to (meta)data preserved at the World Data Center for Climate (WDCC): results, lessons learned, recommendations

Karsten Peters-von Gehlen, Andrej Fast, Daniel Heydebreck, Heinke Höck, Amandine Kaiser, Andrea Lammert, and Hannes Thiemann

The perceived community demand for research data repositories to provide services ensuring that stored data comply with the FAIR principles requires transparent evaluation of such services. In previous work, the long term archiving service WDCC¹ (World Data Centre for Climate) at DKRZ (German Climate Computing Center, Hamburg) underwent an even-handed self-assessment along the published FAIR principles and the results are published on the DKRZ homepage².

Here, we present results of an overhaul of the previous WDCC FAIRness-assessment by subjecting datasets archived in WDCC to a number of now available objective FAIR assessment approaches which are available as questionnaires or fully-automated web applications^3,4,5. In these approaches, FAIRness is assessed using so-called metrics or maturity indicators. While the terminology is more a choice of the test provider - e.g. the term ‘metric’ may be off-putting for some - both give quantitative results. First tests show that (meta)data archived in WDCC seem to attain a higher level of FAIRness when evaluated using questionnaires compared to the results obtained from fully-automated applications. Further work is needed to substantiate this finding.

We learn that while neither one of the two evaluation approaches is ideal, they both show merit. Questionnaires – answered by knowledgeable repository staff – capture domain- and repository-specific aspects of FAIRness, like the use of controlled vocabularies in the datasets, granularity of archived datasets, reuse documentation or clear assessment of local data access protocols. However, the human-performed evaluation does not capture machine-actionability in terms of FAIR. This aspect is – naturally – very well assessed by automatic evaluation approaches, but the results strongly depend on the way the tests for FAIR metrics/maturity indicators are implemented. However, automatic tests often only assess metadata FAIRness, lack domain-specific

FAIRness indicators or yield failed tests if a repositories’ technical properties, e.g. the specification of authentication procedures for data access, are not compatible with what an automatic procedure is built to test for.

Therefore, since WDCC has an over 30 year long history of preserving climate-science related data with a focus on reusability by the community (and beyond), FAIRness evaluations based on human-actionable questionnaires show a high degree of FAIRness. We further learn that there is an urgent need for specifically-designed automatic FAIR testing approaches taking into account domain-specific data standards and structures. Especially the availability of atmospheric and climate science related FAIR metrics/maturity indicators is very limited. We thus recommend compilations of the latter and we will aim at contributing to this effort.

In our contribution, we specifically showcase strong as well as weak aspects of the WDCC service in terms of FAIRness and report on our measures to increase domain-specific FAIRness of WDCC and present recommendations for establishing FAIR indicators for (meta)data common to the Earth System Science community. We will make the results of our assessment openly available on the WDCC homepage as well as produce a corresponding Open Access peer-reviewed publication.

References:

¹https://cera-www.dkrz.de

²https://cera-www.dkrz.de/WDCC/ui/cerasearch/info?site=fairness

³https://www.rd-alliance.org/node/60731/outputs

⁴https://fairsharing.github.io/FAIR-Evaluator-FrontEnd/#!/

⁵https://www.fairsfair.eu/f-uji-automated-fair-data-assessment-tool

How to cite: Peters-von Gehlen, K., Fast, A., Heydebreck, D., Höck, H., Kaiser, A., Lammert, A., and Thiemann, H.: Applying FAIRness evaluation approaches to (meta)data preserved at the World Data Center for Climate (WDCC): results, lessons learned, recommendations, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-12560, https://doi.org/10.5194/egusphere-egu21-12560, 2021.

13:49–13:51

EGU21-14438

Initiating FAIR geothermal data in Indonesia

Dasapta Erwin Irawan

One of the main keys to scientific development is data availability. Not only the data is easily discovered and downloaded, there's also needs for the data to be easily reused. Geothermal researchers, research institutions and industries are the three main stakeholders to foster data sharing and data reuse. Very expensive deep well datasets as well as advanced logging datasets are very important not only for exploitation purposes but also for the community involved eg: for regional planning or common environmental analyses. In data sharing, we have four principles of F.A.I.R data. Principle 1 Findable: data uploaded to open repository with proper data documentations and data schema, Principle 2 Accessible: removed access restrictions such as user id and password for easy downloads. In case of data from commercial entities, embargoed data is permitted with a clear embargo duration and data request procedure, Principle 3 Interoperable: all data must be prepared in a manner for straightforward data exchange between platforms, Principle 4 Reusable: all data must be submitted using common conventional file format, preferably text-based file (eg `csv` or `txt`) therefore it can be analyzed using various software and hardware. The fact that geothermal industries are packed with for-profit motivations and capital intensive would give even more reasons to embrace data sharing. It would be a good way for them to share their role in supporting society. The contributions from multiple stakeholders are the most essential part in science development. In the context of the commercial industry, data sharing is a form of corporate social responsibility (CSR). It shouldn't be defined only as giving out funding to support local communities.

Keywords: open data, FAIR data, data sharing

How to cite: Irawan, D. E.: Initiating FAIR geothermal data in Indonesia, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14438, https://doi.org/10.5194/egusphere-egu21-14438, 2021.

13:51–13:53

EGU21-3922

NFDI4Earth

Hannes Thiemann, Peter Bräsicke, Markus Reichstein, Claus Weiland, Dominik Hezel, Miguel Mahecha, and Lars Bernard

NFDI4Earth (www.nfdi4earth.de) is proposed as the consortium of the German NFDI (National Research Data Infrastructure) to address the digital needs of researchers in Earth System Sciences (ESS). The NFDI4Earth consortium has been created in a bottom-up process and comprises currently 58 members from German universities, research institutions, infrastructure providers, public authorities and different research organizations.

The large number and diversity of observational, analytical, and model data sets in very high spatial, temporal and thematic resolution, confronts the ESS with a strongly increasing amount of data in great heterogeneity and of inherent complexity. Earth system processes constantly change on various time scales and strongly influence each other. Describing and evaluating these processes urgently requires efficient workflows and extremely powerful data analytic frameworks like datacubes as well as appropriate levels of harmonizing related data services and their underlying standards. Research data are currently managed by an unstructured plethora of services that are scattered, heterogeneous and often only project-based without a long-term perspective. A variety of measures and services become bundled under the umbrella of NFDI4Earth in a one-stop service framework. With a common approach to openness and FAIRness, they form a united, sustainable and coherent solution.

In addition to existing links between German and international partners in ESS, NFDI4Earth will establish itself as a single point of contact and the voice for German Earth system scientists in both existing and emerging networks and alliances. NFDI4Earth is for example already striving to establish linkages with federative e-infrastructures like the European Open Science Cloud (EOSC) at an early stage.

How to cite: Thiemann, H., Bräsicke, P., Reichstein, M., Weiland, C., Hezel, D., Mahecha, M., and Bernard, L.: NFDI4Earth, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-3922, https://doi.org/10.5194/egusphere-egu21-3922, 2021.

13:53–13:55

EGU21-9401

ECS

Harmonizing heterogeneous multi-proxy data from Arctic lake sediment records

Gregor Pfalz, Bernhard Diekmann, Johann-Christoph Freytag, and Boris K. Biskaborn

Lake systems play a central role in broadening our knowledge about future trends in the Arctic, as their sediments store information on interactions between climate change, lake ontogeny, external abiotic sediment input, and biodiversity changes. In order to make reliable statements about future lake trajectories, we need sound multi-proxy data from different lakes across the Arctic. Various studies using data from repositories already showed the effectiveness of multi-proxy, multi-site investigations (e.g., Kaufman et al., 2020; PAGES 2k Consortium, 2017). However, there are still datasets from past coring expeditions to Arctic lake systems that are neither included in any of these repositories nor subject to any particular standard. When working with such data from heterogeneous sources, we face the challenge of dealing with data of different format, type, and structure. It is therefore necessary to transform such data into a uniform format to ensure semantic and syntactic comparability. In this talk, we present an interdisciplinary approach by transforming research data from different lake sediment cores into a coherent framework. Our approach adapts methods from the database field, such as developing entity-relationship (ER) diagrams, to understand the conceptual structure of the data independently of the source. Based on this knowledge, we developed a conceptual data model that allows scientists to integrate heterogeneous data into a common database. During the talk, we present further steps to prepare datasets for multi-site statistical investigation. To test our approach, we compiled and transformed a collection of published and unpublished paleolimnological data of Arctic lake systems into our proposed format. Additionally, we show our results from conducting a comparative analysis on a set of acquired data, hereby focusing on comparing total organic carbon and bromine content. We conclude that our harmonized dataset enables numerical inter-proxy and inter-lake comparison despite strong initial heterogeneity.

[1] D. S. Kaufman et al., “A global database of Holocene paleotemperature records,” Sci. Data, vol. 7, no. 115, pp. 1–34, 2020.

[2] PAGES 2k Consortium, “A global multiproxy database for temperature reconstructions of the Common Era,” Sci. Data, vol. 4, no. 170088, pp. 1–33, 2017.

How to cite: Pfalz, G., Diekmann, B., Freytag, J.-C., and Biskaborn, B. K.: Harmonizing heterogeneous multi-proxy data from Arctic lake sediment records, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-9401, https://doi.org/10.5194/egusphere-egu21-9401, 2021.

13:55–13:57

EGU21-2546

ECS

Managing Geophysics datasets: Challenges and perspectives from the UK Polar Data Centre

Alice Fremand

Today, open data policies are better understood by scientists and writing a data management plan is part of every Natural Environment Research Council (NERC) project submission. But that means that scientists are expecting more and more from their data publication or data requests: they want interactive maps, they want more complex data systems, they want to query data and publish them rapidly.

At the UK Polar Data Centre (PDC, https://www.bas.ac.uk/data/uk-pdc/), the datasets are very diverse, reflecting the multidisciplinary nature of polar science. Geophysics datasets include bathymetry, aerogravity, aeromagnetics and airborne radar depth soundings. Encouraging reuse and increasing the value of data is at the core of PDC’s mission. Data published by the PDC are used in a large variety of scientific research projects internationally. For instance, the significant datasets from seabed multibeam coverage of the Southern Ocean enables the British Antarctic Survey to be a major contributor to multiple projects such as International Bathymetric Chart of the Southern Ocean (IBCSO) and Seabed 2030. The wide coverage of airborne radar echo sounding over Antarctica is crucial for the SCAR BEDMAP3 project which aims to produce new map of Antarctic ice thickness and bed topography for the international glaciology and geophysical community.

Over the last year, procedures to preserve, archive and distribute these data have been revised and updated to comply with the requirements of CoreTrustSeal. But we are still looking for new technologies, tools, open-source software that will help us bring interactivity to our datasets and reach the expectations of scientists.

How to cite: Fremand, A.: Managing Geophysics datasets: Challenges and perspectives from the UK Polar Data Centre, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-2546, https://doi.org/10.5194/egusphere-egu21-2546, 2021.

13:57–13:59

EGU21-16054

Advancing the Geosciences through Open Standards

Siri Jodha Khalsa

Data is the lifeblood of the geosciences. Furthermore, the acquisition, processing and interpretation of data all depend on established specifications describing the systems and procedures that were used in producing, describing and distributing that data. It can be said that technical standards underpin the entire scientific endeavour. This is becoming ever truer in the era of Big Data and Open, Transdisciplinary Science. It takes the dedicated efforts of many individuals to create a viable standard. This presentation will describe the experiences and status of standards development activities related to geoscience remote sensing technologies which are being carried out under the auspices of the IEEE Geoscience and Remote Sensing Society (GRSS).

A Standards Development Organization (SDO) exists to provide the environment, rules and governance necessary to facilitate the fair and equitable development of standards, and to assist in the distribution and maintenance of the resulting standards. The GRSS sponsors projects with the IEEE Standards Association (IEEE-SA), which, like other SDOs such as ISO and OGC, has well-defined policies and procedures that help ensure the openness and integrity of the standards development process. Each participant in a standards working group typically brings specific interests as a producer, consumer or regulator of a product, process or service. Creating an environment that makes it possible to find consensus among competing interests is a primary role of an SDO. I will share some of the insights gained from the six standards projects that the GRSS has initiated which involve hyperspectral imagers, the spectroscopy of soils, synthetic aperture radar, microwave radiometers, GNSS reflectometry, and radio frequency interference in protected geoscience bands.

How to cite: Khalsa, S. J.: Advancing the Geosciences through Open Standards, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-16054, https://doi.org/10.5194/egusphere-egu21-16054, 2021.

13:59–14:01

EGU21-15783

A Standards-based Data Catalogue integrating scientific, community-based and citizen science data across the Arctic

Torill Hamre, Finn Danielsen, Michael Køie Poulsen, and Frode Monsen

INTAROS is a Horizon 2020 research and innovation project developing an integrated Arctic Observation System by extending, improving, and unifying existing systems in the different regions of the Arctic. INTAROS integrates distributed repositories hosting data from ocean, atmosphere, cryosphere and land, including scientific, community-based monitoring (CBM) and citizen science (CS) data. Throughout the project, INTAROS has been working closely with several local communities and citizen science programs across the Arctic, to develop strategies and methods for ingestion of data into repositories enabling the communities to maintain and share data. A number of these CBM and CS data collections have been registered in the INTAROS Data Catalogue. Some of these collections are hosted and sustained by large international programs such as PISUNA, eBird, Secchi Disk Study and GLOBE Observer. Registration in the INTAROS Data Catalogue contributes to making these important data collections better known in a wider community of users with a vested interest in the Arctic. It also enables sharing of metadata through open standards for inclusion in other Arctic data systems. This catalogue is a key component in INTAROS, enabling users to search for data across the targeted spheres to assess their usefulness in applications and geographic areas. The catalogue is based on a world-leading system for data management, the Comprehensive Knowledge Archive Network (CKAN). With rich functionality offered out of the box combined with a flexible extension mechanism, CKAN allows for quickly setting up a fully functional data catalogue. The CKAN open-source community offers numerous extensions that can be used as-is or adapted to implement customised functionality for specific user communities. To hold additional metadata elements requested by the partners we modified the standard database schema of CKAN. The presentation will focus on the current capabilities and plans for sustaining and enhancing the INTAROS Data Catalogue.

How to cite: Hamre, T., Danielsen, F., Køie Poulsen, M., and Monsen, F.: A Standards-based Data Catalogue integrating scientific, community-based and citizen science data across the Arctic, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15783, https://doi.org/10.5194/egusphere-egu21-15783, 2021.

14:01–14:03

EGU21-14899

ECS

Curating geosciences data in the Earth, Space and Environmental Sciences – new developments of GFZ Data Services

Florian Ott, Kirsten Elger, and Damian Ulbricht

GFZ Data Services is a domain repository for geosciences data that assigns digital object identifier (DOI) to data and scientific software since 2004. Hosted at the GFZ German Research Centre for Geosciences (GFZ), the repository has a focus on the curation of long-tail data on one hand, but also provides DOI minting services for several global monitoring networks/observatories in geodesy and geophysics (e.g. INTERMAGNET; IAG Services ICGEM, IGETS, IGS; GEOFON) and collaborative projects (TERENO, EnMAP, GRACE, CHAMP). Furthermore, as Allocating Agent for IGSN, the globally unique persistent identifier for physical samples, GFZ is providing IGSN minting services for physical samples.

GFZ Data Services increases the interoperability of long-tail data through (1) the provision of comprehensive domain-specific data description via standardised and machine-readable metadata with controlled domain vocabularies; (2) complementing the metadata with comprehensive and standardised technical data descriptions or reports; and (3) by embedding the research data in wider context by providing cross-references through Persistent Identifiers (DOI, IGSN, ORCID, Fundref) to related research products (text, data, software) and people or institutions involved.

Visibility of the data is established through registration of the metadata at DataCite and the dissemination of metadata in standard protocols. The DOI Landing Pages embed metadata in Schema.org to facilitate discovery through internet search engines like the Google Dataset Search. In addition, we feed links of data and related research products into Scholix, which allows to link data publications and scholarly literature, even when the data are published years after the article.

The new Website of GFZ Data Services has further developed from a searchable data portal (only) to an information point for data publications and data management. This includes information on metadata, data formats, the data publication workflow, FAQ, links to different versions of our metadata editor and downloadable data description templates. Specific data publication guidance is complemented by more general information on data management, like a data management roadmap for PhD students, and links to the data catalogue of GFZ Data Services, the IGSN catalogue of GFZ and RI@GFZ – the data and research infrastructure search portal of GFZ.

Since October 2020, GFZ is a DataCite member. This membership will enable and promote active participation in the current and future venues of technological and service-oriented developments related to the persistent identification of research outputs.

How to cite: Ott, F., Elger, K., and Ulbricht, D.: Curating geosciences data in the Earth, Space and Environmental Sciences – new developments of GFZ Data Services, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14899, https://doi.org/10.5194/egusphere-egu21-14899, 2021.

14:03–14:05

EGU21-13796

Linking Domain Repositories to Build Cyberinfrastructure for Interdisciplinary Critical Zone Research

Jeffery S. Horsburgh, Kerstin Lehnert, and Jerad Bales

Critical Zone science studies the system of coupled chemical, biological, physical, and geological processes operating together across all scales to support life at the Earth's surface (Brantley et al., 2007). In 2020, the U.S. National Science Foundation funded 10 Critical Zone Collaborative Network awards. These 5-year projects will collaboratively work to answer scientific questions relevant to understanding processes in the Critical Zone such as the effects of urbanization on Critical Zone processes; Critical Zone function in semi-arid landscapes and the role of dust in sustaining these ecosystems; processes in deep bedrock and their relationship to Critical Zone evolution; the recovery of the Critical Zone from disturbances such as fire and flooding; and changes in the coastal Critical Zone related to rising sea level. In order to support community data collection, access, and archival for the Critical Zone Network community, the development of new cyberinfrastructure (CI) is now underway that leverages prior investments in domain-specific data repositories that are already operational and delivers data services to established communities. The goal is to create the infrastructure required for managing, curating, disseminating, and preserving data from the new network of Critical Zone Cluster projects, along with legacy datasets from the existing Critical Zone Observatory Network, including digital management of physical samples. This CI will have a distributed architecture that links existing data facilities and services, including HydroShare, EarthChem, SESAR (System for Earth Sample Registration), and eventually other systems like OpenTopography as needed, via a central CZ Hub that provides tools and services for simplified data submission, integrated data discovery and access, and links to computational resources for data analysis and visualization in support of CZ synthesis efforts. Our goal is to make data, samples, and software collected by the CZ Network Cluster projects Findable, Accessible, Interoperable, and Reusable following the FAIR guiding principles for scientific data management and stewardship, by taking advantage of existing, FAIR compliant, domain-specific data repositories. This collaboration among domain repositories to deliver integrated data services for an interdisciplinary science program will provide a template for future development of integrated interdisciplinary data services.

How to cite: Horsburgh, J. S., Lehnert, K., and Bales, J.: Linking Domain Repositories to Build Cyberinfrastructure for Interdisciplinary Critical Zone Research, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-13796, https://doi.org/10.5194/egusphere-egu21-13796, 2021.

14:05–14:07

EGU21-16356

re3data COREF – Enhancing the re3data service as a community-driven and trustworthy resource for research data repositories and portals

Nina Weisweiler, Kirsten Elger, Robert Ulrich, Michael Witt, Lea Maria Ferguson, Maxi Kindling, Gabriele Kloska, Nguyen Thanh Binh, Rouven Schabinger, Dorothea Strecker, Margarita Trofimenko, and Paul Vierkant

re3data is the global registry for research data repositories. As of January 2021, the service lists over 2620 digital repositories across all scientific disciplines and provides an extensive description of repositories based on a detailed metadata schema (https://doi.org/10.2312/re3.008). A variety of funders, publishers, and scientific organizations around the world refer to re3data within their guidelines and policies, recommending the service to researchers looking for appropriate repositories for storage and discovery of research data. With over 750 entries the field of geosciences is one of the most strongly represented subject groups in the registry.

The re3data COREF project (Community Driven Open Reference for Research Data Repositories) started in January 2020 and receives funding from the German Research Foundation (DFG) for 36 months. With its main focus on the current project the presentation will outline the further professionalization of re3data and the provision of reliable and individualizable descriptions of research data repositories. This includes updates and revisions of the metadata schema, the advancement of the technical infrastructure as well as an enhanced overall (technical) service model concept to embed and connect the service within the research data landscape as a community-driven source and reference for trustworthy repositories.

In addition, outcomes from the first re3data COREF stakeholder survey and workshop held in November 2020 will be presented, introducing diverse use cases of the re3data service and examples for the reuse of its metadata. The presentation will address how re3data currently interlinks with external parties and how more advanced options for easier and trustworthy integration of third-party information can be facilitated.

How to cite: Weisweiler, N., Elger, K., Ulrich, R., Witt, M., Ferguson, L. M., Kindling, M., Kloska, G., Binh, N. T., Schabinger, R., Strecker, D., Trofimenko, M., and Vierkant, P.: re3data COREF – Enhancing the re3data service as a community-driven and trustworthy resource for research data repositories and portals, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-16356, https://doi.org/10.5194/egusphere-egu21-16356, 2021.

14:07–14:09

EGU21-3648

Citation and credit: The role of researchers, journals, and repositories to ensure data, software and samples are linked to publications with proper attribution.

Shelley Stall, Helen Glaves, Brooks Hanson, Kerstin Lehnert, Erin Robinson, and Lesley Wyborn

The Earth, space, and environmental sciences have made significant progress in awareness and implementation of policy and practice around the sharing of data, software, and samples. In specific, the Coalition for Publishing Data in the Earth and Space Sciences (https://copdess.org/) brings together data repositories and journals to discuss and address common challenges in support of more transparent and discoverable research and the supporting data. Since the inception of COPDESS in 2014 and the completion of the Enabling FAIR Data Project in 2019, work has continued on the improvement of availability statements for data and software as well as corresponding citations.

As the broad research community continues to make progress around data and software management and sharing, COPDESS is focused on several key efforts. These include 1) supporting authors in identifying the most appropriate data repository for preservation, 2) validating that all manuscripts have data and software availability statements, 3) ensuring data and software citations are properly included and linked to the publication to support credit, 4) encouraging adoption of best practices.

We will review the status of these current efforts around data and software sharing, the important role that repositories and researchers have to ensure that automated credit and attribution elements are in place, and the recent publications on software citation guidance from the FORCE11 Software Implementation Working Group.

How to cite: Stall, S., Glaves, H., Hanson, B., Lehnert, K., Robinson, E., and Wyborn, L.: Citation and credit: The role of researchers, journals, and repositories to ensure data, software and samples are linked to publications with proper attribution. , EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-3648, https://doi.org/10.5194/egusphere-egu21-3648, 2021.

14:09–14:11

EGU21-1299

FAIR, Open and Free does not mean no restrictions

Keith Jeffery

FAIR, open and free are rarely user correctly to describe access to assets. In fact, assets - expected to be or described as FAIR, open and free - are subject to many restrictions. The major ones are:

(1) Security: to protect the asset from unavailability and any process from corruption, related to curation. Security breaches may be criminal.

(2) Privacy: to protect any personal data within or about the asset. The General Data Protection Legislation is highly relevant here and severe punishments are available.

(3) Rights and licences: the asset may be subject to claimed rights (such as copright or database right or even patenting) and also to licensing which may be more or less restrictive;

(4) Authorisation: within an Authentication, Authorisation, Accounting Infrastructure (AAAI), authorisation of authenticated user access in a given user role (owner, manager...) to assets in appropriate modes (read, update...) possibly within a certain time period and subject to asset licensing is only permitted;

(5) Terms and Conditions: the system controlling the assets may have associated terms and conditions of use including - but not restricted to - liability, user behaviour, use of cookies.

In EPOS we are drawing together all these aspects into an integrated policy-driven set of mechanisms in the system including rich metadata, policy and licence documents, informed consent at the user interface and an AAAI system based on the recommendaions of AARC (https://aarc-project.eu/ ).

How to cite: Jeffery, K.: FAIR, Open and Free does not mean no restrictions, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-1299, https://doi.org/10.5194/egusphere-egu21-1299, 2021.

14:11–14:13

EGU21-10817

ECS

Why and how does CDGP limit access to some deep geothermal data

Mathieu Turlure, Marc Schaming, Jean Schmittbuhl, and Marc Grunberg

The Data Centre for Deep Geothermal Energy (CDGP – Centre de Données de Géothermie Profonde, https://cdgp.u-strasbg.fr) was launched in 2016 by the LabEx G-EAU-THERMIE PROFONDE - now ITI GeoT, https://iti-geot.unistra.fr/ - to preserve, archive and distribute data acquired on geothermal sites in Alsace. Since the beginning of the project, specific procedures are followed to respect international requirements for data management. In particular, FAIR recommendations are used to distribute Findable, Accessible, Interoperable and Reusable data “As Open as Possible, as Closed as Necessary”.

CDGP distributes data originating from academic institutions as well as industrial partners. The former are obviously open and disseminated without restriction, to fulfil Open Science requirements. The latter are nevertheless less opened, depending on the access restrictions given by the data owner. Up to now, the industrial data may be open, restricted to academic, distributed case-by-case (after owner’s agreement), or closed. Metadata are fully open. The access rights are also pushed to the EPOS TCS-AH platform (https://tcs.ah-epos.eu).

CDGP implemented an Authentication, Authorization and Accounting Infrastructure (AAAI) to handle the distribution rules. Business category is verified at least for academics to grant access. Datasets are provided (or denied) automatically if possible. If necessary, the user’s request is forwarded to the provider who can accept or disallow access. Reports listing datasets distributed to users are sent to providers every six months. This AAAI is build to earn and keep data providers’ trust, as well as to publicized data.

CDGP is trying to broaden the number of open datasets. There are questions on access restrictions to some vintage industrial data of Soultz-sous-Forêts, since some of them where acquired with public European funding. Also, industrial data from Vendenheim area where several felt earthquakes occurred (2019, 2020), currently not available, may become partly accessible since some exploration was done for “scientific purpose” and that expertise studies are required to understand the induced seismicity.

How to cite: Turlure, M., Schaming, M., Schmittbuhl, J., and Grunberg, M.: Why and how does CDGP limit access to some deep geothermal data, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-10817, https://doi.org/10.5194/egusphere-egu21-10817, 2021.

14:13–15:00

Meet the authors in their breakout text chats