ESSI2.2

Find, access, share and use data across the globe: Infrastructure solutions for Earth System Sciences

ESSI2.2

Find, access, share and use data across the globe: Infrastructure solutions for Earth System Sciences

Convener: Anca Hienola | Co-conveners: Amber Budden, Jacco Konijn, Susan Shingledecker, Lesley Wyborn

vPICO presentations

| Tue, 27 Apr, 11:00–12:30 (CEST)

Public information:

The session will present examples from different fields of expertise in the Environmental and Earth system domain (research and e-infrastructures, repositories and data hubs, software frameworks, interdisciplinary data users, global and domain-specific initiatives, etc.), who are demonstrating the effective use of cloud-based e-infrastructures, their services, quality checks and tools. In that sense, this session also supports tackling the existing and upcoming challenges in the evolution of an integrated, Open and FAIR research ecosystem.

vPICO presentations: Tue, 27 Apr

Chairpersons: Anca Hienola, Jacco Konijn, Lesley Wyborn

11:00–11:05

5-minute convener introduction

Large Infrastructures

11:05–11:15

EGU21-8458

solicited

The ICOS Carbon Portal as example of a FAIR community data repository supporting scientific workflows

Alex Vermeulen, Margareta Hellström, Oleg Mirzov, Ute Karstens, Claudio D'Onofrio, and Harry Lankreijer

The Integrated Carbon Observation System (ICOS) provides long term, high quality observations that follow (and cooperatively set) the global standards for the best possible quality data on the atmospheric composition for greenhouse gases (GHG), greenhouse gas exchange fluxes measured by eddy covariance and CO₂ partial pressure at water surfaces. The ICOS observational data feeds into a wide area of science that covers for example plant physiology, agriculture, biology, ecology, energy & fuels, forestry, hydrology, (micro)meteorology, environmental, oceanography, geochemistry, physical geography, remote sensing, earth-, climate-, soil- science and combinations of these in multi-disciplinary projects.
As ICOS is committed to provide all data and methods in an open and transparent way as free data, a dedicated system is needed to secure the long term archiving and availability of the data together with the descriptive metadata that belongs to the data and is needed to find, identify, understand and properly use the data, also in the far future, following the FAIR data principles. An added requirement is that the full data lifecycle should be completely reproducible to enable full trust in the observations and the derived data products.

In this presentation we will introduce the ICOS operational data repository named ICOS Carbon Portal that is based on the linked open data approach. All metadata is modelled in an ontology coded in OWL and based on a RDF triple store that is available through an open SparQL endpoint. The repository supports versioning, collections and models provenance through a simplified Prov-O ontology. All data objects are ingested under strict control for the identified data types on provision of the correct and sufficient (provenance) metadata, data format and data integrity. All data, including raw data, is stored in the long term trusted repository B2SAFE with two replicates. On top of the triple store and SparQL endpoint we have built a series of services, APIs and graphical interfaces that allow machines to machine and user interaction with the data and metadata. Examples are a full faceted search with connected data cart and download facility, preview of higher level data products (time series of point observations and spatial data), and cloud computing services like eddy covariance data processing and on demand atmospheric footprint calculations, all connected to the observational data from ICOS. Another interesting development is the community support for scientific workflows using Jupyter notebook services that connect to our repository through a dedicated python library for direct metadata and data access.

How to cite: Vermeulen, A., Hellström, M., Mirzov, O., Karstens, U., D'Onofrio, C., and Lankreijer, H.: The ICOS Carbon Portal as example of a FAIR community data repository supporting scientific workflows, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-8458, https://doi.org/10.5194/egusphere-egu21-8458, 2021.

11:15–11:17

EGU21-15394

EPOS-Norway Portal

Jan Michalek, Kuvvet Atakan, Christian Rønnevik, Helga Indrøy, Lars Ottemøller, Øyvind Natvik, Tor Langeland, Ove Daae Lampe, Gro Fonnes, Jeremy Cook, Jon Magnus Christensen, Ulf Baadshaug, Halfdan Pascal Kierulf, Bjørn-Ove Grøtan, Odleiv Olesen, John Dehls, and Valerie Maupin

The European Plate Observing System (EPOS) is a European project about building a pan-European infrastructure for accessing solid Earth science data, governed now by EPOS ERIC (European Research Infrastructure Consortium). The EPOS-Norway project (EPOS-N; RCN-Infrastructure Programme - Project no. 245763) is a Norwegian project funded by National Research Council. The aim of the Norwegian EPOS e‑infrastructure is to integrate data from the seismological and geodetic networks, as well as the data from the geological and geophysical data repositories. Among the six EPOS-N project partners, four institutions are providing data – University of Bergen (UIB), - Norwegian Mapping Authority (NMA), Geological Survey of Norway (NGU) and NORSAR.

In this contribution, we present the EPOS-Norway Portal as an online, open access, interactive tool, allowing visual analysis of multidimensional data. It supports maps and 2D plots with linked visualizations. Currently access is provided to more than 300 datasets (18 web services, 288 map layers and 14 static datasets) from four subdomains of Earth science in Norway. New datasets are planned to be integrated in the future. EPOS-N Portal can access remote datasets via web services like FDSNWS for seismological data and OGC services for geological and geophysical data (e.g. WMS). Standalone datasets are available through preloaded data files. Users can also simply add another WMS server or upload their own dataset for visualization and comparison with other datasets. This portal provides unique way (first of its kind in Norway) for exploration of various geoscientific datasets in one common interface. One of the key aspects is quick simultaneous visual inspection of data from various disciplines and test of scientific or geohazard related hypothesis. One of such examples can be spatio-temporal correlation of earthquakes (1980 until now) with existing critical infrastructures (e.g. pipelines), geological structures, submarine landslides or unstable slopes.

The EPOS-N Portal is implemented by adapting Enlighten-web, a server-client program developed by NORCE. Enlighten-web facilitates interactive visual analysis of large multidimensional data sets, and supports interactive mapping of millions of points. The Enlighten-web client runs inside a web browser. An important element in the Enlighten-web functionality is brushing and linking, which is useful for exploring complex data sets to discover correlations and interesting properties hidden in the data. The views are linked to each other, so that highlighting a subset in one view automatically leads to the corresponding subsets being highlighted in all other linked views.

How to cite: Michalek, J., Atakan, K., Rønnevik, C., Indrøy, H., Ottemøller, L., Natvik, Ø., Langeland, T., Lampe, O. D., Fonnes, G., Cook, J., Christensen, J. M., Baadshaug, U., Kierulf, H. P., Grøtan, B.-O., Olesen, O., Dehls, J., and Maupin, V.: EPOS-Norway Portal, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15394, https://doi.org/10.5194/egusphere-egu21-15394, 2021.

11:17–11:19

EGU21-15205

SIOS Data Management System: distributed data system for Earth System Science

Dariusz Ignatiuk, Øystein Godøy, Lara Ferrighi, Inger Jennings, Christiane Hübner, Shridhar Jawak, and Heikki Lihavainen

Svalbard Integrated Arctic Earth Observing System (SIOS) is an international consortium to develop and maintain a regional observing system in Svalbard and the associated waters. SIOS brings together the existing infrastructure and data of its members into a multidisciplinary network dedicated to answering Earth System Science (ESS) questions related to global change. The Observing System is built around “SIOS core data” – long-term data series collected by SIOS partners. SIOS Data Management System (SDMS) is dedicated to harvesting information on historical and current datasets from collaborating thematic and institutional data centres and making them available to users. A central data access portal is linked to the data repositories maintained by SIOS partners, which manage and distribute data sets and their associated metadata. The integrity of the information and harmonisation of data is based on internationally accepted protocols assuring interoperability of data, standardised documentation of data through the use of metadata and standardised interfaces by data systems through the discovery of metadata. By these means, SDMS is working towards FAIR data compliance (making data findable, accessible, interoperable and reusable), among other initiatives through the H2020 funded ENVRI-FAIR project (http://envri.eu/envri-fair/).

How to cite: Ignatiuk, D., Godøy, Ø., Ferrighi, L., Jennings, I., Hübner, C., Jawak, S., and Lihavainen, H.: SIOS Data Management System: distributed data system for Earth System Science, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15205, https://doi.org/10.5194/egusphere-egu21-15205, 2021.

11:19–11:21

EGU21-9400

The brokering framework empowering WMO Hydrological Observing System (WHOS)

Enrico Boldrini, Paolo Mazzetti, Fabrizio Papeschi, Roberto Roncella, Mattia Santoro, Massimiliano Olivieri, Stefano Nativi, Silvano Pecora, Igor Chernov, and Claudio Caponi

The WMO Commission of Hydrology (CHy) is realizing the WMO Hydrological Observing System (WHOS), a software (and human) framework with the aim of improving sharing of hydrological data and knowledge worldwide.

National Hydrological Services (NHS) are already sharing on the web (both archived and near real time) data collected in each country, using disparate publication services. WHOS is leveraging the Discovery and Access Broker (DAB) technology developed and operated in its cloud infrastructure by CNR-IIA to realize WHOS-broker, a key component of WHOS architecture. WHOS-broker is in charge of harmonizing the available and heterogeneous metadata, data and services making the already published information more accessible to scientists (e.g. modelers), decision makers and general public worldwide.

WHOS-broker supports many service interfaces and API that hydrological application builders already can leverage, example given OGC SOS, OGC CSW, OGC WMS, ESRI Feature Service, CUAHSI WaterOneFlow, DAB REST API, USGS RDB, OAI-PMH/WIGOS, THREDDS. New API and service protocols are continuously added to support new applications, being WHOS-broker a modular and flexible framework with the aim of enabling interoperability and assuring it as the standards will change/evolve through time.

Three target programmes have already benefited from WHOS:

La Plata river basin: hydro and meteo data from Argentina, Bolivia, Brazil, Paraguay, Uruguay are harmonized and shared by WHOS-broker to the benefit of different applications, one of them is the Plata Basin Hydrometeorological Forecasting and Early Warning System (PROHMSAT-Plata model, developed by HRC), based on CUAHSI WaterOneFlow and experts from the five countries.
Arctic-HYCOS: hydro data from Canada, Finland, Greenland, Iceland, Norway, Russia, United States are harmonized and shared by WHOS-broker to the benefit of different applications, one of them is the WMO HydroHub Arctic portal, based on ESRI technologies.
Dominican Republic: hydro and meteo data of Dominican Republic published by different originators is being harmonized by WHOS-broker to the benefit of different applications, one of them is the Met data explorer application developed by BYU based on THREDDS catalog service.

The three programmes should act as a driving force for more to follow, by demonstrating possible applications that can be built on top of WHOS.

The public launch of WHOS official homepage at WMO is expected by mid 2021, will include:

A dedicated web portal based on Water Data Explorer application developed by BYU
Results from the three programs
Detailed information on how to access WHOS data by using one of the many WHOS-broker service interfaces
An online training course for data providers interested in WHOS
The WHOS Hydro Ontology, leveraged by WHOS-broker in order to both semantically augment user queries and harmonize results (e.g. in case of synonyms of the same concept in different languages).

How to cite: Boldrini, E., Mazzetti, P., Papeschi, F., Roncella, R., Santoro, M., Olivieri, M., Nativi, S., Pecora, S., Chernov, I., and Caponi, C.: The brokering framework empowering WMO Hydrological Observing System (WHOS), EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-9400, https://doi.org/10.5194/egusphere-egu21-9400, 2021.

11:21–11:23

EGU21-15148

Teams Win: The European Datacube Federation

Peter Baumann

Collaboration requires some minimum of common understanding, in the case of Earth data in particular common principles making data interchangeable, comparable, and combinable. Open standards help here; in case of Big Earth Data specifically the OGC/ISO Coverages standard. This unifying framework establishes a common framework in particular for regular and irregular spatio-temporal datacubes. Services grounding on such common understanding have proven more uniform to access and handle, implementing a principle of "minimal surprise" for users visiting different portals while using their favourite clients. Data combination and fusion benefits from canonical metadata allowing automatic alignment, e.g, between 2D DEMs, 3D satellite image time series, 4D atmospheric data, etc.

The EarthServer datacube federation s showing the way towards unleashing in full the potential of pixels for supporting the UN Sustainable Development Goals, local governance, and also businesses. EarthServer is an open, free, transparent, and democratic network of data centers offering dozens of Petabytes of a critical variety, such as radar and optical Copernicus data, atmospheric data, elevation data, and thematic cubes like global sea ice. Data centers like DIASs and CODE-DE, research organizations, companies, and agencies have teamed up in EarthServer. Strictly based on the open OGC standards, an ecosystem of data has been established that is available to users as a single pool, without the need for any coding skills (such as python). A specific unique capability is location-transparency: clients can fire their query against any of the mebers, and the federation nodes will figure out the optimal work distribution irrespective of data location.

The underlying datacube engine, rasdaman, enables all datacube access, analytics, and federation. Query evaluation is optimized automatically applying highly efficient intelligent, rule-based methods in homogeneous and heterogeneous mashups, up to satellite on-board deployments as done in the ORBiDANSe project. Users perceive one single, common information space accessible through a wide spectrum of open-source and proprietary clients.

In our talk we present technology, services, and governance of this unique line-up of data centers. A demo will show distributed datacube fusion live.

How to cite: Baumann, P.: Teams Win: The European Datacube Federation, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15148, https://doi.org/10.5194/egusphere-egu21-15148, 2021.

Quality Control

11:23–11:25

EGU21-23

Towards Developing Community Guidelines for Sharing and Reusing Quality Information of Earth Science Datasets

Carlo Lacagnina, Ge Peng, Robert R. Downs, Hampapuram Ramapriyan, Ivana Ivanova, David F. Moroni, Yaxing Wei, Lucy Bastin, Nancy A. Ritchey, Gilles Larnicol, Lesley A. Wyborn, Chung-Lin Shie, Ted Habermann, Anette Ganske, Sarah M. Champion, Mingfang Wu, Irina Bastrakova, Dave Jones, and Gary Berg-Cross

The knowledge of data quality and the quality of the associated information, including metadata, is critical for data use and reuse. Assessment of data and metadata quality is key for ensuring credible available information, establishing a foundation of trust between the data provider and various downstream users, and demonstrating compliance with requirements established by funders and federal policies.

Data quality information should be consistently curated, traceable, and adequately documented to provide sufficient evidence to guide users to address their specific needs. The quality information is especially important for data used to support decisions and policies, and for enabling data to be truly findable, accessible, interoperable, and reusable (FAIR).

Clear documentation of the quality assessment protocols used can promote the reuse of quality assurance practices and thus support the generation of more easily-comparable datasets and quality metrics. To enable interoperability across systems and tools, the data quality information should be machine-actionable. Guidance on the curation of dataset quality information can help to improve the practices of various stakeholders who contribute to the collection, curation, and dissemination of data.

This presentation outlines a global community effort to develop international guidelines to curate data quality information that is consistent with the FAIR principles throughout the entire data life cycle and inheritable by any derivative product.

How to cite: Lacagnina, C., Peng, G., Downs, R. R., Ramapriyan, H., Ivanova, I., Moroni, D. F., Wei, Y., Bastin, L., Ritchey, N. A., Larnicol, G., Wyborn, L. A., Shie, C.-L., Habermann, T., Ganske, A., Champion, S. M., Wu, M., Bastrakova, I., Jones, D., and Berg-Cross, G.: Towards Developing Community Guidelines for Sharing and Reusing Quality Information of Earth Science Datasets, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-23, https://doi.org/10.5194/egusphere-egu21-23, 2021.

11:25–11:27

EGU21-10547

Data flow, harmonization, and quality control

Brenner Silva, Philipp Fischer, Sebastian Immoor, Rudolf Denkmann, Marion Maturilli, Philipp Weidinger, Steven Rehmcke, Tobias Düde, Norbert Anselm, Peter Gerchow, Antonie Haas, Christian Schäfer-Neth, Angela Schäfer, Stephan Frickenhaus, and Roland Koppe and the Computing and Data Centre of the Alfred-Wegener-Institute

Earth system cyberinfrastructures include three types of data services: repositories, collections, and federations. These services arrange data by their purpose, level of integration, and governance. For instance, registered data of uniform measurements fulfill the goal of publication but do not necessarily flow in an integrated data system. The data repository provides the first and high level of integration that strongly depends on the standardization of incoming data. One example here is the framework Observation to Archive and Analysis (O2A) that is operational and continuously developed at the Alfred-Wegener-Institute, Bremerhaven. A data repository is one of the components of the O2A framework and much of its functionality depends on the standardization of the incoming data. In this context, we focus on the development of a modular approach to provide the standardization and quality control for the monitoring of the near real-time data. Two modules are under development. First, the driver module transforms different tabular data to a common format. Second, the quality control module that runs the quality tests on the ingested data. Both modules rely on the sensor operator and on the data scientist, two actors that interact with both ends of the ingest component of the O2A framework (http://data.awi.de/o2a-doc). We demonstrate the driver and the quality control modules in the data flow within Digital Earth showcases that also connect repositories and federated databases to the end-user. The end-user is the scientist, who works closely in the development approach to ensure applicability. The result is the proven benefit of harmonizing data and metadata of multiple sources, easy integration and rapid assessment of the ingested data. Further, we discuss concepts and current development that aim at the enhanced monitoring and scientific workflow.

How to cite: Silva, B., Fischer, P., Immoor, S., Denkmann, R., Maturilli, M., Weidinger, P., Rehmcke, S., Düde, T., Anselm, N., Gerchow, P., Haas, A., Schäfer-Neth, C., Schäfer, A., Frickenhaus, S., and Koppe, R. and the Computing and Data Centre of the Alfred-Wegener-Institute: Data flow, harmonization, and quality control, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-10547, https://doi.org/10.5194/egusphere-egu21-10547, 2021.

11:27–11:29

EGU21-5663

Improved FAIR Data Publication Quality in Specialized Environmental Data Portals

Ionut Iosifescu Enescu, Gian-Kasper Plattner, Lucia Espona Pernas, Dominik Haas-Artho, and Rebecca Buchholz

Environmental research data from the Swiss Federal Research Institute WSL, an Institute of the ETH Domain, is published through the environmental data portal EnviDat (https://www.envidat.ch). EnviDat actively implements the FAIR (Findability, Accessibility, Interoperability and Reusability) principles and offers guidance and support to researchers throughout the research data publication process.

WSL strives to increase the fraction of environmental data easily available for reuse in the public domain. At the same time, WSL facilitates the publication of high-quality environmental research datasets by providing an appropriate infrastructure, a formal publication process and by assigning Document Object Identifiers (DOIs) and appropriate citation information.

Within EnviDat, we conceptualize and implement data publishing workflows that include automatic validation, interactive quality checks, and iterative improvement of metadata quality. The data publication workflow encompasses a number of steps, starting from the request for a DOI, to an approval process with a double-checking principle, and the submission of the metadata-record to DataCite for the final data publication. This workflow can be viewed as a decentralized peer-review and quality improvement process for safeguarding the quality of published environmental datasets. The workflow is being further developed and refined together with partner institutions within the ETH Domain.

We have defined and implemented additional features in EnviDat, such as (i) in-depth tracing of data provenance through related datasets; (ii) the ability to augment published research data with additional resources which support open science such as model codes and software; and (iii) a DataCRediT mechanism designed for specifying data authorship (Collection, Validation, Curation, Software, Publication, Supervision).

We foresee that these developments will help to further improve approaches targeted at modern documentation and exchange of scientific information. This is timely given the increasing expectations that institutions and researchers have towards capabilities of research data portals and repositories in the environmental domain.

How to cite: Iosifescu Enescu, I., Plattner, G.-K., Espona Pernas, L., Haas-Artho, D., and Buchholz, R.: Improved FAIR Data Publication Quality in Specialized Environmental Data Portals, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-5663, https://doi.org/10.5194/egusphere-egu21-5663, 2021.

FAIR Data

11:29–11:31

EGU21-2139

EASYDAB (Earth System Data Branding) for FAIR and Open Data

Anette Ganske, Amandine Kaiser, Angelina Kraft, Daniel Heydebreck, Andrea Lammert, and Hannes Thiemann

As in many scientific disciplines, there are a variety of activities in Earth system sciences that address the important aspects of good research data management. What has not been sufficiently investigated and dealt with so far is the easy discoverability and re-use of quality-checked data. This aspect is taken up by the EASYDAB label.

EASYDAB¹ is a currently developed branding for FAIR and open data from the Earth System Sciences. The branding can be adopted by institutions running a data repository which stores data from the Earth System Sciences. EASYDAB is always connected to a research data publication with DataCite DOIs. Data published under EASYDAB are characterized by a high maturity, extensive metadata information and compliance with a comprehensive discipline-specific standard. For these datasets, the EASYDAB logo is added to the landing page of the data repository. Thereby, repositories can indicate their efforts to publish data with high maturity.

The first standard made for EASYDAB is the ATMODAT standard², which has been developed within the AtMoDat³ project (Atmospheric Model Data). It incorporates concrete recommendations and requirements related to the maturity, publication and enhanced FAIRness of atmospheric model data. The requirements are for rich metadata with controlled vocabularies, structured landing pages, file formats (netCDF) and the structure within files. Human- and machine-readable landing pages are a core element of the ATMODAT standard and should hold and present discipline-specific metadata on simulation and variable level.

The ATMODAT standard includes checklists for the data producer and the data curator so that the compliance with the standard can easily be obtained by both sides. To facilitate automatic checking of the netCDF files headers, a checker program will also be provided and published with DOI. Moreover, a checker for the compliance with the requirements for the DOI Metadata will be developed and made openly available.

The integration of standards from other disciplines in the Earth System Sciences, such as oceanography, into EASYDAB is helpful and desirable to improve the re-use of reviewed, high-quality data.

¹www.easydab.de

²https://cera-www.dkrz.de/WDCC/ui/cerasearch/entry?acronym=atmodat_standard_en_v3_0

³www.atmodat.de

How to cite: Ganske, A., Kaiser, A., Kraft, A., Heydebreck, D., Lammert, A., and Thiemann, H.: EASYDAB (Earth System Data Branding) for FAIR and Open Data, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-2139, https://doi.org/10.5194/egusphere-egu21-2139, 2021.

11:31–11:33

EGU21-5965

PalMod-II Data Management Plan: A FAIR-inspired conceptual framework for data simulation, inter-comparison, sharing and publication

Swati Gehlot, Karsten Peters-von Gehlen, and Andrea Lammert

Large scale transient climate simulations and their intercomparison with paleo data within the German initiative PalMod (www.palmod.de, currently in phase II) provides an exclusive example of applying a Data Management Plan (DMP) to conceptualise data workflows within and outside a large multidisciplinary project. PalMod-II data products include output of three state-of-the-art climate models with various coupling complexities and spatial resolutions simulating the climate of the past 130,000 years. Additional to the long time series of model data, a comprehensive compilation of paleo-observation data (including a model-observation-comparison toolbox, Baudouin et al, 2021 EGU-CL1.2) is envisaged for validation.

Owing to the enormous amount of data coming from models and observations, produced and handled by different groups of scientists spread across various institutions, a dedicated DMP as a living document provides a data-workflow framework for exchange and sharing of data within and outside the PalMod community. The DMP covers the data life cycle within the project starting from its generation (data formats and standards), analysis (intercomparison with models and observations), publication (usage, licences), dissemination (standardised, via ESGF) and finally archiving after the project lifetime. As an active and continually updated document, the DMP ensures the ownership and responsibilities of data subsets of various working groups along with their data sharing/reuse regulations within the working groups in order to ensure a sustained progress towards the project goals.

This contribution discusses the current status and challenges of the DMP for PalMod-II which covers the details of data produced within various working groups, project-wide workflow strategy for sharing and exchange of data, as well as a definition of a PalMod-II variable list for ESGF standard publication. The FAIR (Findability, Accessibility, Interoperability, and Reusability) data principles play a central role and are proposed for the entire life cycle of PalMod-II data products (model and proxy paleo data) for sharing/reuse during and after the project lifetime.

How to cite: Gehlot, S., Peters-von Gehlen, K., and Lammert, A.: PalMod-II Data Management Plan: A FAIR-inspired conceptual framework for data simulation, inter-comparison, sharing and publication , EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-5965, https://doi.org/10.5194/egusphere-egu21-5965, 2021.

11:33–11:35

EGU21-8144

A Standard for the FAIR publication of Atmospheric Model Data developed by the AtMoDat Project

Andrea Lammert, Anette Ganske, Amandine Kaiser, and Angelina Kraft

Due to the increasing amount of data produced in science, concepts for data reusability are of immense importance. One aspect is the publication of data in a way that ensures that it is findable, reusable, traceable and comparable (FAIR¹ principles). However, putting these principles into practice often causes significant difficulties for researchers. Therefore some repositories accept datasets described only with the minimum metadata required for DOI allocation. Unfortunately, this contains not enough information to conform to the FAIR principles - many research data cannot be reused despite having a DOI. In contrast, other repositories aid the researchers by providing advice and strictly controlling the data and their metadata. To simplify the process of defining the needed amount of metadata and of controlling the data and metadata, the AtMoDat² (Atmospheric Model Data) project developed a detailed standard for the FAIR publication of atmospheric model data.

For this purpose we have developed a concept for the “ideal” description of atmospheric model data. A prerequisite for this is the data publication with a DataCite DOI. The ATMODAT standard³ was developed to implement this concept. The standard defines the data format as NetCDF, mandatory metadata (for DOI, landing page and data header), and naming conventions used in climate research - the Climate and Forecast conventions (CF-conventions⁴). However, many variable names used in urban climate research, for example, are not part of the CF-conventions. For this, standard names have to be defined together with the community and the inclusion in the list of CF-conventions has to be requested. Furthermore we developed and published Python routines which allow data producers as well as repositories to check model output data against the standard.

The ATMODAT standard will first be applied by the project partners of the two participating universities (University of Hamburg and Leipzig). Here, climate model data are processed with a post-processor in preparation for publication. Subsequently, the files including the specified metadata for the DataCite metadata schema will be published by the World Data Center for Climate⁵ (WDCC). Data fulfilling the AtMoDat standard will be marked at the landing page by a special EASYDAB⁶ (Earth System Data Branding) logo. EASYDAB is a currently developed branding for FAIR and open data from the Earth System Sciences. This indicates to future data users that the dataset is a verified dataset that can be easily reused. The standardization of the data and the further steps are easily transferable to data from other disciplines.

1 Wilkinson, M., Dumontier, M., Aalbersberg, I. et al.: The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18

2 https://www.atmodat.de/

3 https://cera-www.dkrz.de/WDCC/ui/cerasearch/entry?acronym=atmodat_standard_en_v3_0

4 https://cfconventions.org/

5 https://cera-www.dkrz.de/WDCC/ui/cerasearch/

6 https://www.easydab.de/

How to cite: Lammert, A., Ganske, A., Kaiser, A., and Kraft, A.: A Standard for the FAIR publication of Atmospheric Model Data developed by the AtMoDat Project, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-8144, https://doi.org/10.5194/egusphere-egu21-8144, 2021.

Discussion

11:35–11:37

EGU21-13155

ECS

The I-ADOPT Interoperability Framework: a proposal for FAIRer observable property descriptions

Barbara Magagna, Gwenaelle Moncoiffe, Maria Stoica, Anusuriya Devaraju, Alison Pamment, Sirko Schindler, and Robert Huber

Global environmental challenges like climate change, pollution, and biodiversity loss are complex. To understand environmental patterns and processes and address these challenges, scientists require the observations of natural phenomena at various temporal and spatial scales and across many domains. The research infrastructures and scientific communities involved in these activities are often following their own data management practices which inevitably leads to a high degree of variability and incompatibility of approaches. Consequently, a variety of metadata standards and vocabularies have been proposed to describe observations and are actively used in different communities. However, this diversity in approaches now causes severe issues regarding the interoperability across datasets and hampers their exploitation as a common data source.

Projects like ENVRI-FAIR, FAIRsFAIR, FAIRplus are addressing this difficulty by working on the full integration of services across research infrastructures based on FAIR Guiding Principles supporting the EOSC vision towards an open research culture. Beyond these projects, we need collaboration and community consensus across domains to build a common framework for representing observable properties. The Research Data Alliance InteroperAble Descriptions of Observable Property Terminology Working Group (RDA I-ADOPT WG) was formed in October 2019 to address this need. Its membership covers an international representation of terminology users and terminology providers, including terminology developers, scientists, and data centre managers. The group’s overall objective is to deliver a common interoperability framework for observable property variables within its 18-month work plan. Starting with the collection of user stories from research scientists, terminology managers, and data managers or aggregators, we drafted a set of technical and content-related requirements. A survey of terminology resources and annotation practices provided us with information about almost one hundred terminologies, a subset of which was then analysed to identify existing conceptualisation practices, commonalities, gaps, and overlaps. This was then used to derive a conceptual framework to support their alignment.

In this presentation, we will introduce the I-ADOPT Interoperability Framework highlighting its semantic components. These represent the building blocks for specific ontology design patterns addressing different use cases and varying degrees of complexity in describing observed properties. We will demonstrate the proposed design patterns using a number of essential climate and essential biodiversity variables. We will also show examples of how the I-ADOPT framework will support interoperability between existing representations. This work will provide the semantic foundation for the development of more user-friendly data annotation tools capable of suggesting appropriate FAIR terminologies for observable properties.

How to cite: Magagna, B., Moncoiffe, G., Stoica, M., Devaraju, A., Pamment, A., Schindler, S., and Huber, R.: The I-ADOPT Interoperability Framework: a proposal for FAIRer observable property descriptions, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-13155, https://doi.org/10.5194/egusphere-egu21-13155, 2021.

11:37–11:39

EGU21-15922

F-UJI : An Automated Tool for the Assessment and Improvement of the FAIRness of Research Data

Robert Huber and Anusuriya Devaraju

Making research data FAIR (Findable, Accessible, Interoperable, and Reusable) is critical to maximizing its impact. However, since the FAIR principles are designed as guidelines and do not specify implementation rules, it is difficult to verify the practice of these principles. Therefore, metrics and associated tools need to be developed to enable the assessment of FAIR compliance of services and datasets. Such practical solutions are important for many stakeholders to assess the quality of data-related services. They are important for selecting such services, but can also be used to iteratively improve data offerings, e.g., as part of FAIR advisory processes. With the increasing number of published datasets and the need to test them repeatedly, there is a growing body of literature that recognizes this importance of automated FAIR assessment tools. Our goal is to contribute to this area of FAIR through the development of an open source tool called F-UJI. F-UJI supports programmatic FAIR assessment of research data based on a set of core metrics against which the implementation of FAIR principles can be assessed. This paper presents the development and application of F-UJI and the underlying metrics. For each of the metrics, we have designed and implemented practical tests based on existing standards and best practices for research data. The tests are important to our expanded understanding of how to test FAIR metrics in practice that have not been fully addressed in previous work on FAIR data assessment. We demonstrate the use of the tool by assessing several multidisciplinary datasets from selected trusted digital repositories, followed by recommendations for improving the FAIRness of these datasets. We summarize the experience and lessons learned from the development and testing.

How to cite: Huber, R. and Devaraju, A.: F-UJI : An Automated Tool for the Assessment and Improvement of the FAIRness of Research Data, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15922, https://doi.org/10.5194/egusphere-egu21-15922, 2021.

Tools and Services

11:39–11:41

EGU21-2886

CMIP6 data documentation and citation in IPCC's Sixth Assessment Report (AR6)

Martina Stockhause, Robin Matthews, Anna Pirani, Anne Marie Treguier, and Ozge Yelekci

The the amount of work and resources invested by the modelling centers to provide CMIP6 (Coupled Model Intercomparison Project Phase 6) experiments and climate projection datasets is huge, and therefore it is extremely important that the teams receive proper credit for their work. The Citation Service makes CMIP6 data citable with DOI references for the evolving CMIP6 model data published in the Earth System Grid Federation (ESGF). The Citation Service as a new piece of the CMIP6 infrastructure was developed upon the request from the CMIP Panel.

CMIP6 provides new global climate model data assessed in the IPCC's (Intergovernmental Panel on Climate Change) Sixth Assessment Report (AR6). Led by the Technical Support Unit of IPCC Working Group I (WGI TSU), the IPCC Task Group on Data Support for Climate Change Assessment (TG-Data) developed FAIR data guidelines, for implementation by the TSUs of the three IPCC WGs and the IPCC Data Distribution Centre (DDC) Partners. A central part of the FAIR data guidelines are the documentation and citation of data used in the report.

The contribution will show how CMIP6 data usage is documented in IPCC WGI AR6 from three angles: technical implementation, collection of CMIP6 data usage information from the IPCC authors, and a report users’ perspective.

Links:

CMIP6 Citation Service: http://cmip6cite.wdc-climate.de
CMIP6: https://pcmdi.llnl.gov/CMIP6/
IPCC AR6: https://www.ipcc.ch/assessment-report/ar6/
IPCC AR6 WGI report: https://www.ipcc.ch/report/sixth-assessment-report-working-group-i/
IPCC TG-Data: https://www.ipcc.ch/data/

How to cite: Stockhause, M., Matthews, R., Pirani, A., Treguier, A. M., and Yelekci, O.: CMIP6 data documentation and citation in IPCC's Sixth Assessment Report (AR6), EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-2886, https://doi.org/10.5194/egusphere-egu21-2886, 2021.

11:41–11:43

EGU21-5492

Raiders of the Lost Code: Preserving the MOSS Codebase - Significance, Status, Challenges and Opportunities

Peter Löwe, Māris Nartišs, and Carl N Reed

We report on the current status of the software repository of the Map Overlay and Statistical System (MOSS) and upcoming actions to ensure long term preservation of the codebase as a historic geospatial source. MOSS is the earliest known open source Geographic Information System (GIS). Active development of the vector-based interactive GIS by the U.S. Department of Interior began in 1977 on a CDC mainframe computer located at Colorado State University. Development continued until 1985 with MOSS being ported to multiple platforms, including DG-AOS, UNIX, VMS and Microsoft DOS. Many geospatial programming techniques and functionalities were first implemented in MOSS, including a fully interactive user interface and integrated vector and raster processing. The public availability of the WWW in the early 1990s sparked a growth of new Open Source GIS projects, which led to the formation of the Open Source Geospatial Foundation (OSGeo). The goal of OSGeo is to support and promote the collaborative development of open geospatial technologies and data. This includes best practices for project management and repositories for codebases. From its start, OSGeo recognised MOSS as the original forerunner project. After the decline of active use of MOSS since the 1990s, the U.S. Bureau of Land Management (BLM) continued to provide the open source MOSS codebase on an FTP-Server, which allowed use, analysis and reference by URL. This service was discontinued at some point before 2018, which was eventually discovered due to a broken URL link. This led to a global search and rescue effort among the OSGeo communities to track down remaining offline copies of the codebase. In mid 2020 a surviving copy of the MOSS codebase was discovered at the University of Latvia, which is temporarily preserved at the German Institute of Economic Research (DIW Berlin). OSGeo has agreed to make MOSS the first OSGeo Heritage Project to ensure long term preservation in a OSGeo code repository. This is a significant first step to enable MOSS-related research based on the FAIR (Findable, Accessible, Interoperable, Reusable) paradigm. Follow up actions will be required to enable scientific citation and credit by persistent identifiers for code and persons, such as Digital Object Identifiers (DOI) and Open Researcher Contributor Identification Initiative-ID (ORCID-ID) within the OSGeo repository environment. This will advance the OSGeo portfolio of best practices also for other open geospatial projects.

How to cite: Löwe, P., Nartišs, M., and Reed, C. N.: Raiders of the Lost Code: Preserving the MOSS Codebase - Significance, Status, Challenges and Opportunities, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-5492, https://doi.org/10.5194/egusphere-egu21-5492, 2021.

11:43–11:45

EGU21-8294

Enabling “LiDAR data processing” as a service in a Jupyter environment

Spiros Koulouzis, Yifang Shi, Yuandou Wan, Riccardo Bianchi, Daniel Kissling, and Zhiming Zhao

Airborne Laser Scanning (ALS) data derived from Light Detection And Ranging (LiDAR) technology allow the construction of Essential Biodiversity Variables (EBVs) of ecosystem structure with high resolution at landscape, national and regional scales. Researchers nowadays often process such data, and rapidly prototype using script languages like R or python, and share their experiments via scripts or more recently via notebook environments, such as Jupyter. To scale experiments to large data volumes, extra data sources, or new models, researchers often employ Cloud infrastructures to enhance notebooks (e.g. Jupyter Hub) or execute the experiments as a distributed workflow. In many cases, a researcher has to encapsulate subsets of the code (namely, cells in Jupyter) from the notebook as components to be included in the workflow. However, it is usually time-consuming and a burden for the researcher to encapsulate those components based on the workflow systems' specific interface, where the Findability, Accessibility, Interoperability and Reusability (FAIR) of those components are often limited. We aim to enable the public cloud processing of massive amounts of ALS data across countries and regions and make the retrieval and uptake of such EBV data products of ecosystem structure easily available to a wide scientific community and stakeholders.

We propose and develop a tool called FAIR-Cells, that can be integrated into the Jupyter Lab environment as an extension, to help scientists and researchers improve the FAIRness of their code. It can encapsulate user-selected cells of code as standardized RESTful API services, and allow users to containerize such Jupyter code cells and to publish them as reusable components via the community repositories.

We demonstrate the features of the FAIR-CELLS using an application from the ecology domain. Ecologists currently process various point cloud datasets derived from LiDAR to extract metrics that capture vegetation's vertical and horizontal structure. A new open-source software called ‘Laserchicken’ allows the processing of country-wide LiDAR datasets in a local environment (e.g. the Dutch national ICT infrastructure called SURF). However, the users have to use the Laserchicken application as a whole to process the LiDAR data. The capacity of the given infrastructure also limits the volume of data. In this work, we will first demonstrate how a user can apply the FAIR-Cells extension to interactively create RESTful services for the components in the Laserchicken software in a Jupyter environment, to automate the encapsulation of those services as Docker containers, and to publish the services in a community catalogue (e.g. LifeWatch) via the API (based on GeoNetwork). We will then demonstrate how those containers can be assembled as a workflow (e.g. using Common Workflow Language) and deployed on the cloud environment (offered by the EOSC early adopter program for ENVRI-FAIR) to process a much bigger dataset than in a local environment. The demonstration results suggest that our approach's technical roadmap can achieve FAIRness and behave good parallelism in large distributed volumes of data when executing the Jupyter-environment-based codes.

How to cite: Koulouzis, S., Shi, Y., Wan, Y., Bianchi, R., Kissling, D., and Zhao, Z.: Enabling “LiDAR data processing” as a service in a Jupyter environment, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-8294, https://doi.org/10.5194/egusphere-egu21-8294, 2021.

Discussion

11:45–11:47

EGU21-8418

An online service for analysing ozone trends within EOSC-synergy

Tobias Kerzenmacher, Valentin Kozlov, Borja Sanchis, Ugur Cayoglu, Marcus Hardt, and Peter Braesicke

The European Open Science Cloud-Synergy (EOSC-Synergy) project delivers services that serve to expand the use of EOSC. One of these services, O3as, is being developed for scientists using chemistry-climate models to determine time series and eventually ozone trends for potential use in the quadrennial Global Assessment of Ozone Depletion, which will be published in 2022. A unified approach from a service like ours, which analyses results from a large number of different climate models, helps to harmonise the calculation of ozone trends efficiently and consistently. With O3as, publication-quality figures can be reproduced quickly and in a coherent way. This is done via a web application where users configure their queries to perform simple analyses. These queries are passed to the O3as service via an O3as REST API call. There, the O3as service processes the query and accesses the reduced dataset. To create a reduced dataset, regular tasks are executed on a high performance computer (HPC) to copy the primary data and perform data preparation (e.g. data reduction, standardisation and parameter unification). O3as uses EGI check-in (OIDC) to identify users and grant access to certain functionalities of the service, udocker (a tool to run Docker containers in multi-user space without root privileges) to perform data reduction in the HPC environment, and the Universitat Politècnica de València (UPV) Infrastructure Manager to provision service resources (Kubernetes).

How to cite: Kerzenmacher, T., Kozlov, V., Sanchis, B., Cayoglu, U., Hardt, M., and Braesicke, P.: An online service for analysing ozone trends within EOSC-synergy, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-8418, https://doi.org/10.5194/egusphere-egu21-8418, 2021.

11:47–11:49

EGU21-15669

Applying VocPrez to operational semantic repositories: the NVS experience

Alexandra Kokkinaki, Quyen Luong, Christopher Thompson, Nicholas Car, and Gwenaelle Moncoiffe

The Natural Environment Research Council’s (NERC) Vocabulary Server (NVS¹) has been serving the marine and wider community with controlled vocabularies for over a decade. NVS provides access to standardised lists of terms which are used for data mark-up, facilitating interoperability and discovery in the marine and associated earth science domains. The NVS controlled vocabularies are published as Linked Data on the web using the data model of the Simple Knowledge Organisation System (SKOS). They can also be accessed as web services (RESTFul, SOAP) or through a sparql endpoint. NVS is an operational semantic repository, which underpins data systems like SeaDataNet, the pan-European infrastructure of marine data management, and is embedded in SeaDataNet-specific tools like MIKADO. Its services are being constantly monitored by the SeaDataNet Argo monitoring system, ensuring a guarantee of reliability and availability. In this presentation we will discuss the pathway of challenges we encountered while enhancing an operational semantic repository like NVS with VocPrez, a read-only web delivery system for Simple Knowledge Organization System (SKOS)-formulated RDF vocabularies. We will also present our approach on implementing CI/CD delivery and the added value of VocPrez to NVS in terms of FAIRness. Finally we will discuss the lessons learnt during the lifecycle of this development.

VocPrez² is an open-source, pure Python, application that reads vocabularies from one or more sources and presents them online (HTTP) in several different ways: as human-readable web pages, using simple HTML templates for different SKOS objects and as machine-readable RDF or other formats, using mapping code. The different information model views supported by VocPrez are defined by profiles, that is, by formal specifications. VocPrez supports both different profiles and different formats (Media Types) for each profile.

VocPrez enhanced the publication of NVS both for human users and machines. Humans accessing NVS are presented with a new look and feel that is more user friendly, providing filtering of collections, concepts and thesauri, and sorting of results using different options. For machine-to-machine communication, VocPrez presents NVS content in machine-readable formats which Internet clients can request directly using the Content Negotiation by Profile standard³. The profiles and formats available are also listed on an “Alternate Profiles” web page which is automatically generated per resource thus allowing for discovery of options. As a result, human or machine end users can access NVS collections, thesauri and concepts according to different information models such as DCAT, NVS’ own vocabulary model or pure SKOS and also in different serializations like JSON-LD , turtle, etc. using content negotiation.

¹http://vocab.nerc.ac.uk/

²https://github.com/RDFLib/VocPrez

³https://www.w3.org/TR/dx-prof-conneg/

How to cite: Kokkinaki, A., Luong, Q., Thompson, C., Car, N., and Moncoiffe, G.: Applying VocPrez to operational semantic repositories: the NVS experience, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15669, https://doi.org/10.5194/egusphere-egu21-15669, 2021.

11:49–11:51

EGU21-8903

Generic concepts for organising data management in research projects

Ivonne Anders, Swati Gehlot, Andrea Lammert, and Karsten Peters-von Gehlen

Since few years Research Data Management is becoming an increasingly important part of scientific projects regardless of the number of topics or subjects, researchers or institutions involved. The bigger the project, the more are the data organization and data management requirements in order to assure the best outcome of the project. Despite this, projects rarely have clear structures or responsibilities for data management. The importance of clearly defining data management and also budgeting for it is often underestimated and/or neglected. A rather scarce number of reports and documentations explaining the research data management in certain projects and detailing best practice examples can be found in the current literature. Additionally, these are often mixed up with topics of the general project management. Furthermore, these examples are very focused on the certain issues of the described projects and thus, a transferability (or general application) of provided methods is very difficult.

This contribution presents generic concepts of research data management with an effort to separate them from general project management tasks. Project size, details among the diversity of topics and the involved researcher, play an important role in shaping data management and determining which methods of data management can add value to the outcome of a project. We especially focus on different organisation types, including roles and responsibilities for data management in projects of different sizes. Additionally, we show how and when also education should be included, but also how important agreements in a project are.

How to cite: Anders, I., Gehlot, S., Lammert, A., and Peters-von Gehlen, K.: Generic concepts for organising data management in research projects, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-8903, https://doi.org/10.5194/egusphere-egu21-8903, 2021.

11:51–12:30

Meet the authors in their breakout text chats