ESSI2.8 | Research data infrastructures in ESS - Bridging the gap between user needs and sustainable software solutions, and linking approaches
EDI
Research data infrastructures in ESS - Bridging the gap between user needs and sustainable software solutions, and linking approaches
Co-organized by EOS4
Convener: Christian Pagé | Co-conveners: Claudia Müller, Christin HenzenECSECS, Heinrich Widmann, Kirsten Elger, Kerstin Lehnert
Orals
| Thu, 18 Apr, 16:15–18:00 (CEST)
 
Room G2
Posters on site
| Attendance Thu, 18 Apr, 10:45–12:30 (CEST) | Display Thu, 18 Apr, 08:30–12:30
 
Hall X2
Orals |
Thu, 16:15
Thu, 10:45
Research data infrastructures (RDIs) serve to manage and share research products in a systematic way to enable research across all scales and disciplinary boundaries Their services support researchers in data management and collaborative analysis throughout the entire data lifecycle.

For this fostering of FAIRness and openness, e.g. by applying established standards for metadata, data, and/or scientific workflows, is crucial. Through their offerings and services, RDIs can shape research practices and are strongly connected with the communities of users that identify and associate themselves with them.

Naturally, the potential of RDIs faces many challenges. Even though it is clear that RDIs are indispensable for solving big societal problems, their wide adoption requires a cultural change within research communities. At the same time RDIs themselves must be developed further to serve user needs. And, also at the same time, the sustainability of RDIs must be improved, international cooperation increased, and duplication of development efforts must be avoided. To be able to provide a community of diverse career stages and backgrounds with a convincing infrastructure that is established beyond national and institutional boundaries, new collaboration patterns and funding approaches must be tested so that RDIs foster cultural change in academia and be a reliable foundation for FAIR and open research. This needs to happen while academia struggles with improving researcher evaluation, with a continuing digital disruption, with enhancing scholarly communication, and with diversity, equity, and inclusion.

In Earth System Science (ESS), several research data infrastructures and components are currently developed on different regional and disciplinary scales , all of which face these challenges at some level. solutions
This session provides a forum to exchange methods, stories, and ideas to enable cultural change and international collaboration in scientific communities, to bridge the gap between user needs, and to build sustainable software solutions.

Orals: Thu, 18 Apr | Room G2

The oral presentations are given in a hybrid format supported by a Zoom meeting featuring on-site and virtual presentations. The button to access the Zoom meeting appears just before the time block starts.
Chairpersons: Christian Pagé, Claudia Müller, Christin Henzen
16:15–16:20
16:20–16:30
|
EGU24-5149
|
ESSI2.8
|
On-site presentation
Matthew Harrison, Stephen Mobbs, Emma Bee, Helen Peat, Helen Snaith, Sam Pepler, Martin Juckes, and Gordon Blair and the UKRI NERC Environmental Data Service

The Natural Environment Research Council Environmental Data Service (NERC EDS) provides integrated data services across the breadth of NERC’s data holdings and coordinates closer collaboration and development between NERC’s five environmental data centres. Data is central to modern understanding of our environment. Environmental science is underpinned by access to high quality sources of data and data services. As the principal funder of environmental science in the UK, NERC has supported comprehensive data services and policies since its creation over 50 years ago. Today NERC has five Environmental Data Centres embedded within its Research Centres:
• The British Oceanographic Data Centre (BODC) provides data and services across marine science and is embedded within the National Oceanography Centre.
• The Centre for Environmental Data Analysis (CEDA) provides both atmospheric and Earth Observation data and is embedded within the National Centre for Atmospheric Science and the National Centre for Earth Observation.
• The Environmental Information Data Centre (EIDC) supports the data requirements of the terrestrial and freshwater sciences and is embedded within the UK Centre for Ecology and Hydrology.
• The UK Polar Data Centre (PDC) is responsible for all of the UK’s polar data holdings and associated services and is embedded within the British Antarctic Survey.
• The National Geoscience Data Centre (NGDC) provides geoscience and subsurface data and services and is embedded within the British Geological Survey.
Each of the five environmental data centres specialises in data within a particular sub-discipline of environmental science and serves not only NERC’s science community but also a much broader community of users and stakeholders, which span research, industry, education, government and voluntary organisations. At the same time, science and its applications are becoming increasingly multi-disciplinary. Often users of NERC data will need to access the services provided by multiple data centres, they will also need to be interoperable in a European and global context.
In order to serve an ever-growing community of data users and stakeholders, in 2018 NERC created its Environmental Data Service to coordinate across the data centres. During its first five years the EDS has provided growing coordination between NERC’s data centres, both for user services and the development of new, discipline-independent services.
NERC has recently recommissioned its data services for the period 2023-2028. As a consequence, the EDS is embarking on an ambitious plan to deliver increasingly integrated services across the full breadth of NERC science and meeting the environmental data needs of stakeholders and users across the UK and beyond. This will require further development of common back-end services and front-end development of FAIR practices including standardised vocabularies and ontologies to support both disciplinary science and wider stakeholder engagement but increasingly an approach to transdiciplinarity to facilitate both next generation science and approaches to foster wider data engagement in responding to the grand societal challenges.

How to cite: Harrison, M., Mobbs, S., Bee, E., Peat, H., Snaith, H., Pepler, S., Juckes, M., and Blair, G. and the UKRI NERC Environmental Data Service: The UK Environmental Data Service; transdisciplinary data sharing using common standardised approaches, from National to European, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-5149, https://doi.org/10.5194/egusphere-egu24-5149, 2024.

16:30–16:40
|
EGU24-14586
|
ESSI2.8
|
On-site presentation
Marco Molinaro and the CAESAR Team

CAESAR (Comprehensive Space Weather Studies for the ASPIS Prototype Realisation) is a project funded by ASI (Italian Space Agency) and INAF (Italian National Institute for Astrophysics) for the development of the prototype of ASPIS (ASI SPace weather InfraStructure). We report here design considerations, challenges and final status of the creation of a database for the ASPIS prototype, which will allow for the study of the chain of phenomena from the Sun to Earth and planetary environments. The database is aimed at handling the heterogeneity of metadata and data while storing and managing the interconnections of various Space Weather events. On top of the database, interfaces for users, including a graphical web interface and an advanced Python module (ASPIS.py), have been developed to facilitate data discovery, access, and analysis. The high-level metadata, to inform the discovery phase in the database, have been collected using an internally developed tool, ProSpecT (Product Specification Template). This tool utilises JSON Schema and JSONForms to create a web interface to guide the data providers in describing their "Products" and generate a JSON object with the necessary metadata. The metadata structure starts from the IVOA VOResource standard, tailored to suit the CAESAR project's requirements. At present, approximately 100 product descriptions in JSON format have been collected and used to create wiki-like documentation pages besides helping in examining formats and metadata details for the implementation of the database prototype. The graphical web interface helps the users discover, filter, and access the database content, while ASPIS.py also provides more advanced analysis tooling. Moreover, ASPIS.py sits on top of commonly used Python packages like SunPy, scikit-learn, matplotlib to help integrate research analysis with other tools and research domains. The database has been built keeping in mind adherence to FAIR principles and with the idea to make it easily interoperable with other research data infrastructures in the Space Weather or sibling research domains.

How to cite: Molinaro, M. and the CAESAR Team: Archive prototype for Space Weather phenomena chains from the Sun to the Earth: CAESAR for ASPIS, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-14586, https://doi.org/10.5194/egusphere-egu24-14586, 2024.

16:40–16:50
|
EGU24-12266
|
ESSI2.8
|
ECS
|
On-site presentation
Kaylin Bugbee, Ashish Acharya, Emily Foshee, Muthukumaran Ramasubramanian, Carson Davis, Bishwas Praveen, Kartik Nagaraja, Shravan Vishwanathan, Stephanie Wingo, and Rachel Wyatt

Transformative science often occurs at the boundaries of different disciplines. Making interdisciplinary science data, software and documentation discoverable and accessible is essential to enabling transformative science. However, connecting this diverse and heterogeneous information is often a challenge due to several factors including the dispersed and sometimes isolated nature of data and the semantic differences between topical areas. NASA’s Science Discovery Engine (SDE) has developed several approaches to tackling these challenges. The SDE is a unified, insightful search experience that enables discovery of NASA’s open science data across five topical areas: astrophysics, biological and physical sciences, Earth science, heliophysics and planetary science. In this presentation, we will discuss our efforts to develop a systematic scientific curation workflow to integrate diverse content into a single search environment. We will also share lessons learned from our work to create a metadata crosswalk across the five disciplines. 

How to cite: Bugbee, K., Acharya, A., Foshee, E., Ramasubramanian, M., Davis, C., Praveen, B., Nagaraja, K., Vishwanathan, S., Wingo, S., and Wyatt, R.: The Science Discovery Engine: Connecting Heterogeneous Scientific Data and Information , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12266, https://doi.org/10.5194/egusphere-egu24-12266, 2024.

16:50–17:00
|
EGU24-20127
|
ESSI2.8
|
On-site presentation
Christof Lorenz, Benjamin Louisot, Sabine Barthlott, Benjamin Ertl, Linda Baldewein, Ulrike Kleeberg, Marie Ryan, Nils Brinckmann, Marc Hanisch, Roland Koppe, Marc Adolf, Claas Faber, Andreas Lehmann, David Schäfer, Ralf Kunkel, Ulrich Loup, Jürgen Sorg, and Hylke van der Schaaf

Time-series data are crucial sources of reference information in all environmental sciences. And beyond typical research applications, the consistent and timely publication of such data is increasingly important for monitoring and issuing warnings, especially in times of growing frequencies of climatic extreme events. In this context, the seven Centres from the Helmholtz Research Field Earth and Environment (E&E) operate some of the largest environmental measurement-infrastructures worldwide. These infrastructures range from terrestrial observation systems in the TERENO observatories and ship-borne sensors to airborne and space-based systems, such as those integrated into the IAGOS infrastructures.

In order to streamline and standardize the usage of the huge amount of data from these infrastructures, the seven Centres have jointly initiated the STAMPLATE project. This initiantive aims to adopt the Open Geospatial Consortium (OGC) SensorThings API (STA) as a consistent and modern interface tailored for time-series data. We evaluate STA for representative use-cases from environmental sciences and enhance the core data model with additional crucial metadata such as data quality, data provenance and extended sensor metadata. After centre-wide implementation, the standardized STA interface also serves community-based tools, e.g., for data visualization, data access, quality assurance/quality control (QA/QC), or the management of monitoring systems. By connecting the different STA endpoints of the participating research Centres, we establish an interlinked research data infrastructure (RDI) and a digital ecosystem around the OGC SensorThings API tailored towards environmental time-series data.

In this presentation, we want to show the status of the project and give an overview of the current data inventory as well as linked tools and services. We will further demonstrate the practical application of our STA-based framework with simple and representative showcases. With our contribution, we want to promote STA for similar applications and communities beyond our research field. Ultimately, our goal is to provide an important building block towards fostering a more open, FAIR (Findable, Accessible, Interoperable, and Reusable), and harmonized research data landscape in the field of environmental sciences.

How to cite: Lorenz, C., Louisot, B., Barthlott, S., Ertl, B., Baldewein, L., Kleeberg, U., Ryan, M., Brinckmann, N., Hanisch, M., Koppe, R., Adolf, M., Faber, C., Lehmann, A., Schäfer, D., Kunkel, R., Loup, U., Sorg, J., and van der Schaaf, H.: An interlinked research data infrastructure for time-series data from the Helmholtz Research Field Earth & Environment, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20127, https://doi.org/10.5194/egusphere-egu24-20127, 2024.

17:00–17:10
|
EGU24-15740
|
ESSI2.8
|
On-site presentation
Roope Tervo, Joachim Saalmüller, Umberto Modigliani, Vasileios Baousis, Jörg Schulz, Mike Grant, Francesco Murdaca, Xavier Abellan, and Roberto Cuccu

The European Weather Cloud (EWC) is the cloud-based collaboration platform for meteorological application development and operations in Europe and to enable the digital transformation of the European Meteorological Infrastructure. It consists of data-proximate cloud infrastructure established by the EUMETSAT and ECMWF. The EWC is open and partners partners can federate the access to their data or infrastructure assets.

The EWC is available for EUMETSAT and ECMWF Member and Cooperating States and EUMETSAT Satellite Application Facilities (SAFs) covering both research and operational use cases. Resources are also available for research initiatives, undertaken by one or more EUMETSAT or ECMWF Member States, via specific EUMETSAT Research and Developoperament (R&D) calls and ECMWF Special Projects. Currently, EWC hosts 16 R&D calls and Special Projects, lasting 1-3 years.

The EWC focuses very much on the community taking an iterative user needs-based approach in the development. Notably, research projects and operational applications use the very same environment, which smooths the transition from research to operations (R2O). The hosted services will also be augmented with the Software Marketplace, providing EWC users with the ability to easily share and exploit meteorological applications, algorithms, and machine-learning models. The EWC facilitates a Rocket.chat-based discussion platform for users to discuss and work together, promoting in practice the fundamental collaborative nature of this cloud offering.  

EWC hosts over 132 diverse types of use cases containing, for example, data processing, data services, application development, training, EO and weather data image production, post-processing, and experimenting with cloud technologies. To name a few examples in more detail, the FEMDI project, consisting of 11 European meteorological services, develops data services employing EWC for distributing open meteorological data to fulfil the EU Open Data directive requirements. Second, the Norwegian Meteorological Institute (MET) is piloting an infrastructure federation to create water-quality products by locating the processing chain close to the data. Lastly, numerous projects are developing machine-learning-based models in EWC, including e.g. nowcasting, medium-term weather forecasting, and feature detection from climate data.  

The main data holding accessible to the EWC users is the sum of all online data and products available at ECMWF and EUMETSAT. Services to access the data support both pull and push paradigms for long time series and time-critical access respectively. The services are supported by related functions, such as display, reformat, etc., as per applicable policies. The data offering will be augmented over time based on user needs. 

From a technological viewpoint, the initiative offers services that carry the highest benefits from cloud technology taking the users’ needs, use cases, and existing software into account. EWC looks forward to further develop the service from the current infrastructure-as-a-service (IaaS) model toward platform-as-a-service (PaaS). The plan consists of a Kubernetes engine, a high-throughput batch processing engine, function-as-a-service (serverless) capabilities, and several auxiliary services to support collaborative development and operations. 

How to cite: Tervo, R., Saalmüller, J., Modigliani, U., Baousis, V., Schulz, J., Grant, M., Murdaca, F., Abellan, X., and Cuccu, R.: The European Weather Cloud (EWC) – Collaboration Platform for Meteorological Development from Research to Applications, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-15740, https://doi.org/10.5194/egusphere-egu24-15740, 2024.

17:10–17:20
|
EGU24-16205
|
ESSI2.8
|
ECS
|
On-site presentation
Qi Xu, Xiaoyan Hu, and Ziming Zou

There are many cutting-edge interdisciplinary scientific problems in the Earth and space sciences, such as solar-terrestrial complex system research and study of celestial bodies.The cross-disciplinary data discovery and access services, data analysis and fusion services are a common need for users of these cutting-edge problems.

This presentation introduces data services practices implemented by the NSSDC as a national-level data centre, in order to enhance findability and accessibility of data for interdisciplinary research and application. Basically, NSSDC has formed an multidisciplinary data resources system, including space astronomy, space physics and space environment, planetary science, space geoscience, etc. In order to share data, NSSDC customizes the data service system for each satellite project and ground-based observation project. To enhance the discoverability, NSSDC developed a data retrieval platform which proving a cross-system, cross-disciplinary, distributed data resource discovery service. Meantime, the data catalogues are synchronized to third-party platforms by harvesting or registration through the data retrieval platform. Besides, multidisciplinary analysis and fusion tools and IT infrastructure will be integrated a research data infrastructure in the field of solar-terrestrial space physics and astronomy.

Especially, NSSDC have established strategic cooperation with other National Science Data Centers in the fields of astronomy and high-energy physics. For the common community of specific cross-cutting scientific problems and applications, NSSDC has engaged in practices such as the co-construction of multi-source data resources, the interconnection of data infrastructures, and the construction of data application ecosystems. Finally, this presentation also will explain the NSSDC’s next practice plan in data new paradigm technology innovation cooperation with more interdisciplinary data centers.

How to cite: Xu, Q., Hu, X., and Zou, Z.: Chinese National Space Science Data Center (NSSDC)’s Data Services Practices for Interdisciplinary Research and Application, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-16205, https://doi.org/10.5194/egusphere-egu24-16205, 2024.

17:20–17:30
|
EGU24-13522
|
ESSI2.8
|
ECS
|
On-site presentation
Nigel Rees, Lesley Wyborn, Rui Yang, Jo Croucher, Hannes Hollmann, Rebecca Farrington, Yue Sun, Yiling Liu, and Ben Evans

The 2030 Geophysics Collections Project was a collaborative effort between the National Computational Infrastructure (NCI), AuScope, Terrestrial Ecosystem Research Network (TERN) and the Australian Research Data Commons (ARDC) that aimed to create a nationally transparent, online geophysics data environment suitable for programmatic access on High Performance Computing (HPC) at the NCI. Key focus areas of this project included the publication of internationally standardised geophysical data on NCI’s Gadi Tier 1 research supercomputer, as well as the development of geophysics and AI-ML related specialised software environments that allow for efficient multi-physics processing, modeling and analysis at scale on HPC systems.

Raw and high-resolution versions of AuScope funded Magnetotelluric (MT), Passive Seismic (PS) and Distributed Acoustic Sensing (DAS) datasets are now accessible on HPC along with selected higher-level data products. These datasets have been structured to enable horizontal integration, allowing disparate datasets to be accessed in real-time as online web services from other repositories. Additionally, vertical integration has been established for MT data, linking the source field acquired datasets with derivative processed data products at the NCI repository, as well as linking to other derivative data products hosted by external data portals.

To support next-generation geophysical research at scale, these valuable datasets and accompanying metadata need to be captured in machine-readable formats and leverage international standards, vocabularies and identifiers. For MT, automations were developed that generate different MT processing levels at scale in internationally compliant high-performant data and metadata standards. By parallelising these automated processes across HPC clusters, one can rapidly generate different processing levels for entire geophysical surveys in a matter of minutes. 

In parallel with these data enhancements, the NCI-geophysics software environment was developed, which compiled and containerised a wide range of geophysical and data science related packages in Python, Julia and R. In addition, the NCI-AI-ML environment bundled together popular machine learning and data science packages and configured them for HPC GPU architectures. Standalone open source geophysical applications that support parallel computation have also been added to NCI’s Gadi supercomputer. 

The 2030 Geophysics Collections Project has made the first strides towards enabling a new era in Australian geophysical research, opening up the potential for rapid multi-physics geophysical analysis at scale with the computational tools available within the NCI. By establishing and continuing to build on this geophysical infrastructure, the nation will be better equipped to address the various geophysical challenges and opportunities in the decades ahead.

How to cite: Rees, N., Wyborn, L., Yang, R., Croucher, J., Hollmann, H., Farrington, R., Sun, Y., Liu, Y., and Evans, B.: Developing cutting-edge geophysical data and software infrastructure to future-proof national scale geophysical assets for 2030 computation  , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-13522, https://doi.org/10.5194/egusphere-egu24-13522, 2024.

17:30–17:40
|
EGU24-3617
|
ESSI2.8
|
On-site presentation
Mohan Ramamurthy

NSF Unidata is a community data facility for the Earth Systems Sciences (ESS), established in 1984 by U.S. universities with sponsorship from the U. S. National Science Foundation. NSF Unidata exists to engage and serve researchers and educators who are advancing the frontiers of their fields; we support their efforts by creating opportunities for community members from many backgrounds and disciplines to share data, knowledge, methods, and expertise. As part of this effort, we strive to provide well-integrated data services and software tools that address the entire geoscientific data lifecycle, from locating and retrieving useful data, through the process of analyzing and visualizing data either locally or remotely, to curating and sharing the results. NSF Unidata currently serves more than 1,500 universities and colleges worldwide, which form the core of a member community spanning thousands of government and research institutions worldwide that rely on Unidata products and services.

Dramatic changes in the technological, scientific, educational, and public policy landscape are transforming the ways our community members conduct their research and educate new generations of scientists. To meet these challenges, Unidata is reimagining how the program can best fulfill its mission. This proposal provides a description of how Unidata plans to serve its community going forward by focusing on four types of activities:

  • Providing Data and Tools: ensuring fair and equitable access to ESS and other data from a variety of sources, along with cutting-edge tools to analyze and visualize that data.
  • Reducing Barriers to Participation: building partnerships with minority-serving institutions and under-resourced groups to increase engagement and collaboration, helping to build a larger, more inclusive community of ESS practitioners.
  • Fostering Community Action: engaging community members to advance adoption of initiatives like FAIR and CARE data principles to promote Open Science concepts, strengthening ESS teaching and research.
  • Providing Innovative Technical Solutions: guiding the ESS community toward technical solutions that leverage the most useful innovations in AI/ML, modern open source software, and cloud-centric data-proximate analysis.

Within these broad categories, Unidata proposes a variety of actions guided by the concept of convergence science, wherein individuals from across many disciplines collaborate to address “Grand Challenge” questions in areas such as climate change, ocean health, and natural disaster resilience. Unidata’s part in this endeavor centers on the creation of community hubs, which will bring together varied data, software tools for analysis and visualization, and learning resources to inform the community members who gather to find innovative courses of action with respect to these complex problems.

In this presentation, I’ll describe how NSF Unidata is reimagining its future activities in delivering a comprehensive suite of products and services to advance Earth Systems Science research and education by partnering with a broad range of users in the community.

How to cite: Ramamurthy, M.: NSF Unidata Reimagined:  Data Services to Advance Convergent Earth Systems Science, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-3617, https://doi.org/10.5194/egusphere-egu24-3617, 2024.

17:40–17:50
|
EGU24-20760
|
ESSI2.8
|
On-site presentation
Clément Albinet, Aimee Barciauskas, Kathleen Baynes, George W. Chang, Brian M. Freitag, Laura Innice Duncanson, Gerald F. Guala, Hua Hook, Neha Hunka, Henri Laur, Marco Lavalle, Cristiano Lopes, Alex Mandel, David F. Moroni, Tamara Queune, Sujen Shah, and Nathan Marc Thomas

The scientific community is faced with a need for greatly improved data sharing, analysis, visualization and advanced collaboration based firmly on open science principles. Recent and upcoming launches of new satellite missions with more complex and voluminous data, as well as the ever more urgent need to better understand the global carbon budget and related ecological processes provided the immediate rationale for the ESA-NASA Multi-mission Algorithm and Analysis Platform (MAAP).

This highly collaborative joint project of ESA and NASA established a framework between ESA and NASA to share data, science algorithms and compute resources in order to foster and accelerate scientific research conducted by ESA and NASA EO data users. Presented to the public in October 2021 [1], the current version of MAAP provides a common cloud-based platform with computing capabilities co-located with the data, a collaborative coding and analysis environment, and a set of interoperable tools and algorithms developed to support, for example, the estimation and visualization of global above-ground biomass.

Data from the Global Ecosystem Dynamics Investigation (GEDI) mission on the International Space Station [2] and the Ice, Cloud, and Land Elevation Satellite-2 (ICESat-2) [3] have been instrumental pioneer products on MAAP, generating the first comprehensive map of Boreal above-ground biomass [4] and supporting the CEOS Biomass Harmonization Activity [5]. Crucially, the platform is being specifically designed to support the forthcoming ESA Biomass mission [6] and incorporate data from the upcoming NASA-ISRO SAR (NISAR) mission [7]. While these missions and the corresponding research leading up to launch, which includes airborne, field, and calibration/validation data collection and analyses, provide a wealth of information relating to global biomass, they also present data storing, processing and sharing challenges; the NISAR mission alone will produce around 40 petabytes of data per year, presenting a challenge that, without MAAP, would impose several accessibility limits on the scientific community and impact scientific progress.

Other challenges being addressed by MAAP include: 1) Enabling researchers to easily discover, process, visualize and analyze large volumes of data from both agencies; 2) Providing a wide variety of data in the same coordinate reference frame to enable comparison, analysis, data evaluation, and data generation; 3) Providing a version-controlled science algorithm development environment that supports tools, co-located data and processing resources; 4) Addressing intellectual property and sharing challenges related to collaborative algorithm development and sharing of data and algorithms.

 

REFERENCES

[1] https://www.nasa.gov/feature/nasa-esa-partnership-releases-platform-for-open-source-science-in-the-cloud

[2] https://science.nasa.gov/missions/gedi

[3] https://icesat-2.gsfc.nasa.gov/

[4] https://daac.ornl.gov/ABOVE/guides/Boreal_AGB_Density_ICESat2.html            

[5] https://iopscience.iop.org/article/10.1088/1748-9326/ad0b60

[6] T. Le Toan, S. Quegan, M. Davidson, H. Balzter, P. Paillou, K. Papathanassiou, S. Plummer, F. Rocca, S. Saatchi, H. Shugart and L. Ulander, “The BIOMASS Mission: Mapping global forest biomass to better understand the terrestrial carbon cycle”, Remote Sensing of Environment, Vol. 115, No. 11, pp. 2850-2860, June 2011.

[7] P.A. Rosen, S. Hensley, S. Shaffer, L. Veilleux, M. Chakraborty, T. Misra, R. Bhan, V. Raju Sagi and R. Satish, "The NASA-ISRO SAR mission - An international space partnership for science and societal benefit", IEEE Radar Conference (RadarCon), pp. 1610-1613, 10-15 May 2015.

How to cite: Albinet, C., Barciauskas, A., Baynes, K., Chang, G. W., Freitag, B. M., Duncanson, L. I., Guala, G. F., Hook, H., Hunka, N., Laur, H., Lavalle, M., Lopes, C., Mandel, A., Moroni, D. F., Queune, T., Shah, S., and Thomas, N. M.: Joint ESA-NASA Multi-Mission Algorithm and Analysis Platform (MAAP), EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20760, https://doi.org/10.5194/egusphere-egu24-20760, 2024.

17:50–18:00
|
EGU24-18756
|
ESSI2.8
|
On-site presentation
Benedikt Gräler, Johannes Schnell, Katharina Demmich, Yagmur Yildiz, Merel Vogel, Julia Kraatz, Stefano Bagli, and Paolo Mazzoli

Given the global scope of the current climate crisis, it is important that it be addressed in all sectors of society. From the increased risk of extreme weather events, to the heightened variability in climate patterns, data and knowledge sharing among both citizens and scientists alike is necessary for the planning of a sustainable future. Thus, the I-CISK project aims to create a human-centered, co-designed, co-created, co-implemented, and co-evaluated climate service (CS), which allows citizens, stakeholders, and decision-makers to take climate-informed decisions into their own hands. 

With helpful insight and discussions with I-CISK partners, and input from the seven Living Labs (LL), in the project’s current stage, the first preoperational CSs have been developed. User-stories which were derived from these discussions aided in the creation of the preoperational CSs, therefore ensuring that the data and information being displayed were tailored to the needs of end-users. 

One key challenge faced during the development of the CSs was presenting weather and climate variables in a way that could be easily-understood by end-users, while simultaneously addressing the questions posed by different stakeholders. Within this challenge, scale raised a significant issue; often-times users preferred to have data visualized on a local scale, however most forecast data was only available at a larger scale. This meant that forecast data had to be spatially corrected to fit this requirement. Another issue faced during development was to provide visualizations that enabled end-users to readily understand uncertainty forecasts; since forecasts for future weather patterns are calculated using different climate models, this means that there is a level of uncertainty when comparing various forecasts. Thus, there is not a single “truth”, and it was imperative that this be made clear when creating the preoperational CSs. To achieve this end, functional and sketch-based mock-ups were designed and discussed with end-users, and within the consortium. Then, based on feedback, they were iteratively further developed. 

Alongside the challenge of how to clearly visualize climate information, another key challenge was finding the most robust and relevant data sources to serve local information needs. We found that to meet data requirements, this meant not only gathering forecast data, but also observed historical data. With these data both displayed in the preoperational CSs, users were therefore able to compare past and future weather patterns with their own personal experience. This further helped users to understand the information that was being relayed in the CSs, and boosted their ability to assess climate predictions. 

In this presentation, we present the general approach of co-designing the preoperational CSs, and what we derived from it. We will also present the technical set-up to integrate the various data sources, the Docker-based semi-automated concept to deploy the individual CS applications in the cloud, and finally, next steps to engage users in current functional CS mock-ups. This work highlights the importance of creating CSs with a human-centered approach, and demonstrates how it has been done within the I-CISK project framework.

How to cite: Gräler, B., Schnell, J., Demmich, K., Yildiz, Y., Vogel, M., Kraatz, J., Bagli, S., and Mazzoli, P.: A Preoperational Climate Service Information System: Addressing Technical Challenges and Enhancing User Engagement, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18756, https://doi.org/10.5194/egusphere-egu24-18756, 2024.

Posters on site: Thu, 18 Apr, 10:45–12:30 | Hall X2

The posters scheduled for on-site presentation are only visible in the poster hall in Vienna. If authors uploaded their presentation files, these files are linked from the abstracts below, but only on the day of the poster session.
Display time: Thu, 18 Apr 08:30–Thu, 18 Apr 12:30
Chairpersons: Heinrich Widmann, Kirsten Elger, Kerstin Lehnert
X2.23
|
EGU24-5393
|
ESSI2.8
Alexander Wolodkin, Anette Ganske, Angelina Kraft, Andrea Lammer, Claudia Martens, Ichrak Salhi, Markus Stocker, Hannes Thiemann, and Claus Weiland

The BITS project is building a Terminology Service (TS) for Earth System Sciences (ESS TS). As a first step it will develop this service for the subfields of climate science (results from climate simulations) and geodiversity collections (representing Earth’s diversity of i.a. rocks, fossils, soils, sediments). The project will use the existing Terminology Service of the TIB – Leibniz Information Centre for Science and Technology, which currently contains  190 ontologies, more than 1.2 million terms and over 26,000 properties from a range of domains such as architecture, chemistry, computer science, mathematics and physics. It has implemented the ESS collection within the TIB TS by now, which already contains relevant terminologies for the ESS and to which further relevant terminologies will be added. 

The ESS TS will be integrated into the two different data repositories of the German Climate Computing Center (DKRZ) and the Senckenberg - Leibniz Institution for Biodiversity and Earth System Research (SGN):

  • DKRZ will use the TS to develop a more user-friendly search for its World Data Center for Climate (WDCC) repository. The terminologies will be used to suggest additional and more relevant search terms to users. This will help users who are unfamiliar with the terminology used by the climate community to find the right keywords for their data search and to get accurate and high quality search results.
  • SGN will use the TS to add standardised structured metadata to geothematic Digital Specimens in their digital collections. This will increase the FAIRness of collection data, i.e. foster self-contained discovery and processing of Digital Specimens by software agents or, in short, machines (machine actionability).  

The experience gained in building the ESS TS and integrating it into the repositories at DKRZ and SGN will be used to create blueprints to connect later on other Earth System Science repositories to the TS. We also aim to work closely with NFDI4Earth and the wider ESS community, and with TS4NFDI as the NFDI base service project for terminology services.

As BITS evolves, ESS TS will be supplemented by additional components e.g. to support FAIR semantic mappings (leveraging on SGN’s mapping.bio service). However, feedback from the wider ESS community about expectations for such a service and their needs is welcome and required for the project. Our aim is a Terminology Service, which serves as a valuable resource for researchers, students, professionals and developers in ESS, providing them with accurate and consistent terminology to enhance their work, improve communication and data sharing and advance knowledge in their respective fields.

How to cite: Wolodkin, A., Ganske, A., Kraft, A., Lammer, A., Martens, C., Salhi, I., Stocker, M., Thiemann, H., and Weiland, C.: BITS: BluePrints for the Integration of Terminology Services in Earth System Sciences, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-5393, https://doi.org/10.5194/egusphere-egu24-5393, 2024.

X2.24
|
EGU24-4067
|
ESSI2.8
|
ECS
Lucia Mandon, Bernard Schmitt, Damien Albert, Manon Furrer, Philippe Bollard, Maria Gorbacheva, Lydie Bonal, and Olivier Poch

A critical missing database for the astrophysics and planetary science community using spectroscopy data is a compilation of band parameters (e.g., their position, width, intensity) of solids, for comparison with laboratory and field spectra, and observations of extraterrestrial objects. While many databases exist for gases [1], there's a scarcity for solids and liquids (mostly as tables in a few books and review papers), and the mode attributions of bands is not always clear.

The Solid Spectroscopy Hosting Architecture of Databases and Expertise (SSHADE) (https://www.sshade.eu/; [2]) is hosting data from 30 research groups in spectroscopy of solids across 15 countries. It provides spectra of solids relevant to astrophysics and planetary science (ices, minerals, carbonaceous matters, meteorites…) and over a wide range of wavelengths (mostly X-ray and VUV to sub-mm). Initial compilation of the “BandList” database [3], which is hosted in SSHADE, was publicly released in October 2021. It is an ongoing effort to provide the parameters (position, width, intensity, and their accuracy, shape) associated with electronic, vibration, and phonon bands of simple solids (ices, simple organics, minerals), in absorption and in Raman emission, and for different pressure and temperature. It also includes the solid composition and isotopic species involved, as well as the mode assignment (Fig. 1). The database is compilated from exhaustive review of the literature and laboratory measurements on well-characterized species, and as of early 2024, it consisted of over 1240 bands associated with 60 different band lists, including minerals and ices in different phases. An online search tool allows users to find specific bands or lists. Results can be displayed graphically using a spectra simulator with various unit and display options (Fig. 1), and data can be exported for further analysis. 

 

Figure 1. Absorption band list of natural calcite [4] from the SSHADE-BandList interface. (a) Bands displayed individually. (b) Sum of bands of whole band list.

Development of the SSHADE-BandList database interface and content will most likely last many years. This tool is expected to be crucial, aiding in the identification of unknown absorption bands in astrophysical and solar system objects, of best spectra to use in radiative transfer models, and in guiding the conception of new instruments.

 

References 

[1] Albert et al. (2020), Atoms 8(4)

[2] Schmitt et al. (2014), EPSC 2014

[3] Schmitt et al. (2022), EPSC 2022

[4] Leclef and Schmitt (2022), SSHADE/BANDLIST (OSUG Data Center)

How to cite: Mandon, L., Schmitt, B., Albert, D., Furrer, M., Bollard, P., Gorbacheva, M., Bonal, L., and Poch, O.: SSHADE-BandList, a novel database of absorption and Raman bands of solids , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-4067, https://doi.org/10.5194/egusphere-egu24-4067, 2024.

X2.25
|
EGU24-15031
|
ESSI2.8
Nils Brinckmann, Michael Langbein, Benjamin Proß, Arne Vogt, and Elisabeth Schöpfer

Analysing individual hazards and the associated risk is a challenging task in its own right. It requires a lot of expertise built up over dozens of years. Unfortunately, there are situations where that a single hazard can trigger following - often horrific - consequences. The history of international catastrophes is full of examples: The fires in San Francisco after the 1906 earthquake due to destroyed gas pipelines; the tsunami that destroyed the Fukushima nuclear power plant after the Tohoku earthquake, or the climatic effects of the Krakatau eruption in 1883.

In our RIESGOS project we have been working on an demonstrator app to analyse multi-risk-scenarios - with a strong focus on the earthquake-tsunami combination. This is an use case that is very relevant in our partner countries Ecuador, Peru and Chile - and the knowledge is provided here by the partner institutions of the RIESGOS consortium.

The technical approach is strongly focused to be standard based using OGC Web Processing Services, as well as to be distributed. This allows to use the specific expertise of each of the partner institution to be taken into account, to share the involved data & algorithms that have been built up and refined over years.

What we focus in this presentation is a deeper insight into the implementation perspective, with the benefits as well as strategies to overcome challenging
aspects that we encounted when working with the distributed risk analysis framework. These include requirements on interoperability, deployments of bundled versions for testing & transfer, monitoring and others.

How to cite: Brinckmann, N., Langbein, M., Proß, B., Vogt, A., and Schöpfer, E.: Implementing a Distributed Processing Framework for Multi-Risk Analysis - A Lessons Learned Perspective, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-15031, https://doi.org/10.5194/egusphere-egu24-15031, 2024.

X2.26
|
EGU24-12978
|
ESSI2.8
|
ECS
Joost Hemmen, David Schäfer, Martin Abbrent, Florian Gransee, Tobias Kuhnert, Bert Palm, Maximilian Schaldach, Christian Schulz, Martin Schrön, Thomas Schnicke, and Jan Bumberger

Robust infrastructures for managing and accessing high volume data are an essential foundation for unraveling complex spatiotemporal processes in the earth system sciences. Addressing multifaceted research questions demands data from diverse sources; however, isolated solutions hinder effective collaboration and knowledge advancement.

We present a novel digital ecosystem for FAIR time series data management, deeply rooted in contemporary software engineering and developed at the Helmholtz Centre for Environmental Research (UFZ) in Leipzig, Germany. Designed to flexibly address discipline-specific requirements and workflows, the system emphasizes user-centric accessibility, ensuring the reliability, efficiency, and sustainability of time series data across different domains and scales.

Our time series ecosystem includes a user-centric web-based frontend for (real-time) data flow and metadata management, a versatile data integration layer, a robust time series database, efficient object storage, near real-time quality control, and comprehensive data visualization capabilities. Supporting modern and classical data transfer protocols, the system ensures compliance with OGC standards for data access, facilitating efficient progress in the data lifecycle through high-performance computing. This fully integrated and containerized solution enables swift deployment and seamless integration with existing services.

Illustrating the practical application of the system, we showcase its success in managing Cosmic Ray Neutron Sensing data from the TERENO project. This success story underscores the system's effectiveness in addressing challenges associated with time series data management in earth system sciences, fostering more efficient research and facilitating informed decision-making processes.

This contribution aligns seamlessly with the session's focus on connecting RDIs. We aim to promote transferable approaches, use existing standards, and facilitate collaborations transcending barriers among RDI providers, developers, and researchers. By presenting our experiences and best practices, this presentation invites engagement and discussions to collectively address the challenges in bringing research data infrastructures together.

How to cite: Hemmen, J., Schäfer, D., Abbrent, M., Gransee, F., Kuhnert, T., Palm, B., Schaldach, M., Schulz, C., Schrön, M., Schnicke, T., and Bumberger, J.: Advancing Data Management: A Novel Digital Ecosystem for FAIR Time Series Data Management in Earth System Sciences, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12978, https://doi.org/10.5194/egusphere-egu24-12978, 2024.

X2.27
|
EGU24-10364
|
ESSI2.8
Marcus Strobl, Elnaz Azmi, Bischof Balazs, Safa Bouguezzi, Alexander Dolich, Sibylle K. Hassler, Mirko Mälicke, Ashish Manoj J, Jörg Meyer, Achim Streit, and Erwin Zehe

The rapid growth of environmental data and the complexity of data pre-processing tasks poses significant challenges to environmental scientists. Repetitive and error-prone manual data preparation methods not only consume valuable research time but also introduce potential data quality issues. Also, individually pre-processed datasets are hardly reproducible. The V-FOR-WaTer virtual research environment (VRE) addresses these challenges as a powerful tool that seamlessly integrates data access, data pre-processing, and data exploration capabilities.

V-FOR-WaTer has an automated data pre-processing workflow to improve data preparation by eliminating the need for manual data cleaning, standardization, harmonization, and formatting. This approach significantly reduces the risk of human error while freeing up researchers to focus on their actual research questions without being hampered by data preparation. The pre-processing tools integrated in the virtual research environment are designed to handle a wide range of data formats, ensuring consistent and reliable data preparation across diverse disciplines. This empowers researchers to seamlessly integrate data from various sources in a standardized manner.

The web portal's user-centric design facilitates data exploration and selection through map operations and filtering options, empowering researchers to efficiently identify and focus on relevant data for their analyses. The scalability and extensibility of the V-FOR-WaTer web portal ensures that it can accommodate the ever-growing volume of environmental data and adapt to the evolving research landscape. Its ability to integrate user-developed tools reflects the dynamic nature of environmental research and ensures that the virtual research environment stays up-to-date with the latest advancements in data processing. The comprehensive features and user-friendly interface position it as a valuable tool for environmental scientists, fostering collaboration, streamlining data analysis, and accelerating the advancement of knowledge in the field of hydrology.

How to cite: Strobl, M., Azmi, E., Balazs, B., Bouguezzi, S., Dolich, A., Hassler, S. K., Mälicke, M., Manoj J, A., Meyer, J., Streit, A., and Zehe, E.: Streamlining Data Pre-processing and Analysis through the V-FOR-WaTer Web Portal, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-10364, https://doi.org/10.5194/egusphere-egu24-10364, 2024.

X2.28
|
EGU24-13159
|
ESSI2.8
|
ECS
Haytam Elyoussfi, Abdelghani Boudhar, Salwa Belaqziz, Mostafa Bousbaa, Karima Nifa, Bouchra Bargam, Ismail Karaoui, Ayoub Bouihrouchane, Touria Benmira, and Abdelghani Chehbouni

Data-driven methods, such as machine learning (ML) and deep learning (DL), play a pivotal role in advancing the field of snow hydrology. These techniques harness the power of algorithms to analyze and interpret vast datasets, allowing researchers to uncover intricate patterns and relationships within the complex processes of snow dynamics. In snow hydrology, where traditional models may struggle to capture the nonlinear and dynamic nature of snow-related phenomena, data-driven methods provide a valuable alternative. Using data-driven methods (ML and DL) requires advanced skills in various fields, such as programming and hydrological modeling. In response to these challenges, we have developed an open-source Python package named MorSnowAI that streamlines the process of building, training and testing artificial intelligence models based on machine learning and deep learning methods. MorSnowAI not only automates the building, training, and testing of artificial intelligence models but also significantly simplifies the collection of data from various sources and formats, such as reanalyzing datasets (ERA5-Land) from Copernicus Climate Data and remote sensing data from Modis, Landsat, and Sentinel datasets to calculate Normalized Difference Snow Index (NDSI). It can also utilize local datasets as inputs for the model. Among other features available in the MorSnowAI package, it provides pre-processing and post-processing methods that users can choose, along with visualization and analysis of the available time series. The scripts developed in the MorSnowAI package have already undergone evaluation and testing in various snow hydrology applications. For instance, these applications include predicting snow depth, streamflow, snow cover, snow water equivalent, and groundwater levels in mountainous areas of Morocco. The automated processes within MorSnowAI contribute to advancing the field, enabling researchers to focus on refining model inputs, interpreting results, and improving the overall understanding of complex hydrological systems. By bridging the gap between hydrology and advanced data-driven techniques, MorSnowAI fosters advancements in research, offering valuable insights for resource management in regions heavily influenced by snow dynamics. 

How to cite: Elyoussfi, H., Boudhar, A., Belaqziz, S., Bousbaa, M., Nifa, K., Bargam, B., Karaoui, I., Bouihrouchane, A., Benmira, T., and Chehbouni, A.: MorSnowAI v1.0 : An Open-Source Python Package for Empowering Artificial Intelligence in Snow Hydrology - A Comprehensive Toolbox, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-13159, https://doi.org/10.5194/egusphere-egu24-13159, 2024.

X2.29
|
EGU24-13651
|
ESSI2.8
Charles Zender

Research data infrastructures (RDIs) like the Coupled Model Intercomparison Project (CMIP) exemplify geoscientific dataset archive organization and applied informatics. The CMIP metadata and data policies have continuously co-evolved with mature and FAIR technologies (e.g., CF, OpenDAP, ESGF) that are, in turn, often adopted by other RDIs. Improved lossy and lossless compression support in the standard netCDF/HDF5 scientific software stack merit consideration for adoption in upcoming MIPs and RDIs like CMIP7. We have proposed a three point plan to CMIP7 to utilize modern lossy and lossless compression to reduce its storage and power requirements (and associated greenhouse gas emissions). The plan will boost the compression ratio of CMIP-like datasets by a factor of about three relative to CMIP6, preserve all scientifically meaningful data, and retain CF-compliance. We will present the plan, and discuss why and how to implement it in CMIP7 and other MIPs and RDIs.

How to cite: Zender, C.: Why and How to Increase Dataset Compression in RDIs and MIPs like CMIP7, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-13651, https://doi.org/10.5194/egusphere-egu24-13651, 2024.

X2.30
|
EGU24-18056
|
ESSI2.8
Antonio S. Cofiño and David Dominguez Roman

Internationally-coordinated climate model intercomparison projects (MIPs) explore the uncertainties inherent to climate change science. The Multi-MIP Climate Change ATLAS repository [1] is the backbone of the Sixth IPCC Assessment Report (AR6) Atlas Chapter, which provides a region-by-region assessment of climate change including also the innovative Interactive Atlas [2]. The Interactive Atlas complements the report by providing flexible spatial and temporal analyses of regional climate change, based on different MIPs.

The IPCC AR6 promotes best practises in traceability and reproducibility of the results shown in the report, including the adoption of the Findable, Accessible, Interoperable, and Reusable (FAIR) principles for scientific data. In particular, reproducibility and reusability are central in order to ensure the transparency of the final products. The ATLAS products are generated using free software community tools, based on the climate4R framework [3], for data post-processing (data access, regridding, aggregation, bias adjustment, etc.), evaluation and quality control (when applicable). All the ATLAS code is made publicly available as notebooks and scripts [1].

The Executable Book Project (EBP) [4] is an international collaboration between several universities and open source projects, to build tools that facilitate computational narratives (books, lectures, articles, etc …) using open source tools allowing users from scientific and academic communities to be able to: merge rich text content, output from live code, references, cross-references, equations, images, etc; execute content and cache results; combine into a document model, cached outputs and content files; build interactive (i.e. HTML) and publication-quality (PDF) outputs; and control everything from a simple interface. 

In this contribution, a demonstration of a computational book has been created using the JupyterBook ecosystem, binding the code scripts and the notebooks from the Multi-Model Intercomparison Project (Multi-MIP) Climate Change Atlas repository to improve its reproducibility and reusability. 

Acknowledgement: This work is partly supported by: project CORDyS (PID2020-116595RB-I00) funded by MCIN/AEI/10.13039/501100011033; Ministry for the Ecological Transition and the Demographic Challenge (MITECO) and the European Commission NextGenerationEU (Regulation EU 2020/2094), through CSIC's Interdisciplinary Thematic Platform Clima (PTI-Clima); and, the ENES-RI and IS-ENES3 project which is funded by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 824084 

[1] https://github.com/SantanderMetGroup/ATLAS
[2] http://interactive-atlas.ipcc.ch
[3] https://github.com/SantanderMetGroup/climate4R
[4] https://executablebooks.org

How to cite: Cofiño, A. S. and Dominguez Roman, D.: Executable Book for the IPCC AR6 ATLAS products, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18056, https://doi.org/10.5194/egusphere-egu24-18056, 2024.

X2.31
|
EGU24-19677
|
ESSI2.8
Kameswarrao Modali, Karsten Peters-von Gehlen, Florian Ziemen, Rajveer Saini, Simon Grasse, and Martin Schultz

As the High Performance Computing (HPC) marches into the exascale era, earth system models have transformed into a numerical regime wherein simulations with a 1 km spatial resolution on a global scale are a reality and are currently being performed at various HPC centers across the globe. In this contribution, we provide an overview of the strategy  and plans to adapt the data handling services and workflows available at the German Climate Computing Center (DKRZ) and the Jülich Supercomputing Center (JSC) to enable efficient data access, processing and sharing of output from such simulations using current and next generation Earth System Models. These activities are carried out in the framework of projects funded on an EU as well as national level, such as NextGEMS, WarmWorld and EERIE. 

With the increase in spatial resolution comes the inevitable jump in the volume of the output data. In particular, the throughput due to the enhanced computing power always surpasses the capacity of single-tier storage systems made up of homogeneous hardware  and necessitates multi-tier storage systems consisting of heterogeneous hardware . As a consequence, new issues arise for an efficient, user-friendly data management within each site. Sharing of model outputs that may be produced at different data centers and stored across different multi-tier storage systems poses additional challenges, both in terms of technical aspects (efficient data handling, data formats, reduction of unnecessary transfers) and semantic aspects (data discovery and selection across sites). Furthermore, there is an increasing need for scientifically operational solutions, which requires the development of long-term strategies that can be sustained within the different data centers. To achieve all of this, existing workflows need to be analyzed and largely rewritten. On the upside, this will allow the introduction of new concepts and technologies, for example using the recent zarr file format instead of the more traditional netCDF format.

More specifically, in WarmWorld, the strategy is to create an overarching user interface, to enable the discovery of the federated data, and implement the backend infrastructure for handling the movement of the data, across the storage tiers (SSD, HDD, tape, cloud), within as well as across the HPC centers, as necessitated by the analytical tasks. This approach will also leverage the benefits of community efforts  in redesigning the way km-scale models provide their output, i.e. on hierarchical grids and in relatively small chunks.

We present specific ongoing work to implement this data handling strategy across HPC centers and outline the vision for the handling of high-volume climate model simulation output in the exascale era to enable the efficient analysis of the information content from these simulations. 

How to cite: Modali, K., Peters-von Gehlen, K., Ziemen, F., Saini, R., Grasse, S., and Schultz, M.: A cascaded framework for unified access to and analysis of kilometer scale global simulations across a federation of data centers, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19677, https://doi.org/10.5194/egusphere-egu24-19677, 2024.

X2.32
|
EGU24-16749
|
ESSI2.8
Emanuel Soeding, Andrea Poersch, Yousef Razeghi, Dorothee Kottmeier, and Stanislav Malinovschii

At the Helmholtz Association, we strive to establish a well-formed harmonized data space, connecting information across distributed data infrastructures. This requires standardizing the description of data sets with suitable metadata to achieve interoperability and machine actionability.

One way to make connections between datasets and to avoid redundancy in metadata is the consistent use of Persistent Identifiers (PIDs). PIDs are an integral element of the FAIR principles (Wilkinson et al. 2016) and recommended to refer to data sets. But also to other meta data such as people, organizations, projects, laboratories, repositories, publications, vocabularies, samples, instruments, licenses, and methods should be commonly referenced by PIDs, but not for all of these agreed identifiers exist. Consistently integrating the existing PIDs into data infrastructures can create a high level of interoperability allowing to build connections between data sets from different repositories according to common meta information. In HMC we start this process by implementing PIDs for people (ORCID) and organizations (ROR) in data infrastructures.

Harmonizing PID metadata, however, is only the first step in setting up a data space. Here we shed some light on which strategies we recommend for the implementation within the Helmholtz Association and make suggestions, which stakeholder groups should be included in order to hold them responsible for maintaining them to shape the Helmholtz Data Space. The conclusions from this process do not only affect the implementation of PID metadata, but may also be used for the harmonization of vocabularies, digital objects, interfaces, licenses, quality flags and others, in order to connect our global data systems, to redefine stakeholder responsibility and to ultimately reach the data space.

How to cite: Soeding, E., Poersch, A., Razeghi, Y., Kottmeier, D., and Malinovschii, S.: Reaching the Data Space – Standard data procedures and defining responsibilities for common data elements, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-16749, https://doi.org/10.5194/egusphere-egu24-16749, 2024.

X2.33
|
EGU24-19849
|
ESSI2.8
|
ECS
Samuel Jennings, Kirsten Elger, Sven Fuchs, Florian Neumann, Ben Norden, Simone Frenzel, Stephan Maes, and Nikolas Ott

An increasing pressure from governing bodies and funding agencies to disseminate research data in an open and FAIR (Findable, Accessible, Interoperable, and Reusable) format has led to an increase in online research portals of varying quality. The task of constructing and maintaining such portals is challenging, especially when left to individuals with limited understanding of modern web architecture. For those starting out on this endeavour, an over-abundance of online advice, coupled with the rapid evolution of “latest technologies”, can be overwhelming. The inevitable uncertainty leads to technologically-isolated portals with limited interoperability that ultimately hinders the exchange of geoscientific information.

To reduce uncertainty for new initiatives, Geoluminate (https://geoluminate.github.io/geoluminate/) – a new micro web framework – offers a simple but robust platform for the rapid creation and deployment of new geoscience research portals. The framework's simplicity ensures that even those with limited expertise in web development can create and maintain effective portals that exhibit consistency in both design and functionality. Geoluminate aims to foster interoperability, reliability and decentralization of geoscience portals by providing a consistent and stable foundation on which they are built.

Leveraging existing features of the Python-based Django Web Framework, Geoluminate offers a comfortable learning curve for those already familiar with Python programming. On top of the feature-rich ecosystem of Django, Geoluminate offers additional features specifically tailored to the needs of geoscientific research portals. Geoluminate is highly-opinionated and comes “batteries included” so that, as a research community, the focus can remain on designing data models that fit specific community needs and less on tedious implementation details.

Currently backed by the international geothermal community as part of the World Heat Flow Database Project (http://heatflow.world/project), Geoluminate is under active development at the GFZ German Research Centre for Geosciences in Potsdam. Under the guidance of the partner repository GFZ Data Services, all data models are intrinsically tied to existing standards of metadata collection (e.g. Datacite, IGSN, ROR, ORCID) such that data publishing is easily facilitated through established pathways.

Geoluminate champions the principles of open science and collaborative knowledge dissemination. This poster presentation aims to showcase the practical implementation and benefits of Geoluminate in creating geoscience research portals that align with FAIR data principles. By fostering a community-centric approach, Geoluminate contributes to the democratization of data management, enabling researchers to actively shape and enhance the landscape of those same portals they likely utilize in their own research.

How to cite: Jennings, S., Elger, K., Fuchs, S., Neumann, F., Norden, B., Frenzel, S., Maes, S., and Ott, N.: Geoluminate: A community-centric framework for the creation, deployment and ongoing development of decentralized geoscience data portals, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19849, https://doi.org/10.5194/egusphere-egu24-19849, 2024.

X2.34
|
EGU24-17279
|
ESSI2.8
Ivonne Anders, Peter Braesicke, Auriol Degbelo, Sibylle K. Hassler, Christin Henzen, Ulrike Kleeberg, Marie Ryan, and Hannes Thiemann

The National Research Data Infrastructure (NFDI) aims to create a sustainable and networked infrastructure for research data and helps to overcome the challenges associated with the storage, management and processing, security, and provision of research data in Germany [1]. It thus plays an important role in promoting open science and the exchange of FAIR research data. One of the NFDI initiatives is NFDI4Earth, which focuses on Earth System Sciences (ESS) [2]. Within the many ESS sub-disciplines, there is a diverse range of relevant high-quality data, services, tools, software, data repositories, as well as training and learning materials. Thus, it is not easy for researchers to find these various useful resources. Additionally, there is a lack of knowledge on how to use them due to an enormous diversity of standards, platforms, etc.

The NFDI4Earth OneStop4All addresses these issues by serving as the primary user-friendly access point (Web portal) to the relevant ESS resources. It gives a coherent overview of the (distributed) resources for research data management (RDM), and data analysis/data science that are made available by the members of the NFDI4Earth as well as the Earth System Science (ESS) community. In particular, the OneStop4All provides access to data and software repositories, subject-specific RDM articles and a learning management system for open educational resources relevant to ESS researchers. In addition, it guides users through the NFDI4Earth resources according to their specific ESS RDM and data science needs and capabilities. The OneStop4All also promotes seamless access to a distributed user support network.

The design and development of the OneStop4All is centered on the needs of the users. A good user experience requires an understanding of user behaviour, goals, motivations, and expectations and incorporating this knowledge into every stage of the design process. To achieve this, we use methods from user-centered design (UCD), complemented by knowledge and experience in various ESS disciplines from the members of the NFDI4Earth consortium, their extended scientific networks and by directly involving the community. 

We present the process of developing the user interface concept for the OneStop4All concerning usability and user experience and first insights into the platform are given.

 

References

[1] Agreement between the Federal Government and the Länder concerning the Establishment and Funding of a National Research Data Infrastructure (NDFI) of 26 November 2018: PDF-Datei 

[2] NFDI4Earth Consortium. (2022, Juli 7). NFDI4Earth - National Research Data Infrastructure for Earth System Sciences. Zenodo. https://doi.org/10.5281/zenodo.6806081

 

How to cite: Anders, I., Braesicke, P., Degbelo, A., Hassler, S. K., Henzen, C., Kleeberg, U., Ryan, M., and Thiemann, H.: Towards a user-friendly NFDI4Earth OneStop4All portal to support researchers in Earth System Sciences in Germany, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-17279, https://doi.org/10.5194/egusphere-egu24-17279, 2024.

X2.35
|
EGU24-20387
|
ESSI2.8
Robert Huber, Alejandra Gonzalez Beltran, Charlotte Neidiger, Robert Ulrich, and Hervé L’Hours

Identifying, finding and gaining a sufficient overview of the functions and characteristics of data repositories and their catalogues is essential for users of data repositories and catalogues in the environmental and geosciences, as well as in other domains. However, achieving this is not trivial within a reasonable amount of time and effort, especially for less experienced users.  This lack of  transparent, human- and machine-friendly exposure of essential data repository information impacts many possible stakeholders that need up to date and reliable information about data repositories to serve a broad range of users. These include, for example, search engines and registries such as GEOSS, re3data or FAIRsharing.  Researchers need to be able to find FAIR enabling trustworthy repositories to deposit, curate and preserve their own digital objects, as well as  to reliably find FAIR data already gathered by others in order to reuse it. Assessment bodies such as CoreTrustSeal need transparent access to data repositories’ functions and characteristics in order to facilitate their certification process. An  overview of the data and metadata standards, exchange services and interfaces offered by repositories is essential to data scientists in order to effectively integrate these into their workflows. 

In this study we present how seemingly self-evident information about how the identity, purpose ('this is a data repository'), mandate and areas of responsibility of data repositories is exposed to humans and machines via websites and/or catalogues. Our findings are that  such information is difficult to find and in  many cases, machine-readable metadata is not clear, not relevant or missing altogether. We also show that despite all the efforts and successes in developing  discipline specific standards over the last decades, these are insufficiently linked to from more domain agnostic standards. This absence of domain specific information in PID systems and search engines makes it to large extent invisible in the FAIR ecosystem. In particular, relevant metadata representations or links to discipline specific, standardised services, such as the Open Geospatial Consortium (OGC) suite of services, are rarely exposed.

In this paper, we seek to present the simple and effective methods being elaborated within the FAIR-IMPACT project to improve this situation by using existing and emerging methods and standards. To this end, we will show effective ways that repositories can expose services information and standards via typed-link-based sign-posting as currently summarised in the FAIRiCAT approach. We will evaluate the options for implementation across  domain-specific metadata as well as domain-independent formats such as DCAT or schema.org and show how they can be used in combination with FAIRiCAT in practice. We will also present methods for exposing the FAIR status of digital objects and the FAIR-enabling and trustworthiness status of  data repositories to improve cooperation and information exchange between data repositories, registries, assessment providers and certification authorities.

How to cite: Huber, R., Gonzalez Beltran, A., Neidiger, C., Ulrich, R., and L’Hours, H.: Towards Transparent Presentation of FAIR-enabling Data Repository Functions & Characteristics, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20387, https://doi.org/10.5194/egusphere-egu24-20387, 2024.

X2.36
|
EGU24-17495
|
ESSI2.8
Valentina Protopopova-Kakar, Florian Ott, Kirsten Elger, Melanie Lorenz, and Wolfgang zu Castell

A core element of the National Research Data Infrastructure (NFDI) initiative in Germany is the ambition to harmonize the research data landscape not only on a national level, but to connect to and intertwine with international initiatives in Research Data Management (RDM).  In the context to increase the interoperability between different research data domains, metadata standardization, controlled vocabularies, application programming and the setup of different service interfaces are key areas of interest. As such, the NFDI is the German contributor to the European Open Science Cloud (EOSC), and strives to become a central contact point between German and international stakeholders. To achieve such a harmonized, interoperable and international data landscape, the NFDI Consortium for Earth System Sciences (NFDI4Earth) is open to promote common standards to the national Earth System Science (ESS) community and to support the development of new RDM pathways by connecting and actively participating in international initiatives. NFDI4Earth also strives to foster a cultural change towards increased awareness of FAIR (Findable, Accessible, Interoperable, Reusable) and Open Science principles in Germany. Having a user-friendly technical infrastructure, meaningful services, as well as up-to-date educational resources are all important elements in NFDI4Earth to advance the cultural shift in the ESS research community towards FAIR and open research data management.  Another important part of the cultural change is to acknowledge data and software publications as scientific merit and to recognize those as part of scientific achievements.  

How to cite: Protopopova-Kakar, V., Ott, F., Elger, K., Lorenz, M., and zu Castell, W.: The importance of interlinking Research Data Infrastructures and Research Data Management initiatives, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-17495, https://doi.org/10.5194/egusphere-egu24-17495, 2024.

X2.37
|
EGU24-19436
|
ESSI2.8
|
ECS
Florian Ott, Kirsten Elger, Simone Frenzel, Alexander Brauser, and Melanie Lorenz

The ongoing digitalisation together with new methods for inter- and transdisciplinary research (e.g., AI, ML) triggered the development of large research infrastructures across the Earth and environmental sciences (e.g. EPOS, EnvriFAIR or the German NFDI), and to increasing demands for seamless data integration and visualisation that requires interoperability of data formats and the used of agreed metadata standards. Especially for data intensive disciplines in geophysics and geodesy, metadata standards are important and already in place and widely adopted (e.g. RineX/SineX formats for GNSS data and GeodesyML metadata for GNSS stations; mSEED format and FDSN metadata recommendations for seismological data). In addition, it becomes increasingly relevant to connect research outputs (papers, data, software, samples) with each other and with the originating researchers and institutions – in unique and machine-readable way. The use of persistent identifier (like DOI, ORCID, ROR, IGSN) and descriptive linked data vocabularies/ontologies in the metadata associated with research outcomes are strongly supporting these tasks.

In this presentation, we will elaborate the role and potential of domain-specific research data repositories for the process described above. Domain repositories are digital archives that manage and preserve curated research data (and/or software, sample descriptions) from specific scientific disciplines. The metadata associated with the DOI-referenced objects is specific for their domain and richer than generic metadata supposed to describe data across many scientific disciplines. They often offer data curation by domain researchers and data specialists and make sure that relevant persistent identifiers are included in the standardised XML or JSON metadata for data discovery that is complementing the disciplinary metadata described above.

Our example is GFZ Data Services, the domain repository for geosciences data, hosted at the GFZ German Research Centre for Geosciences. GFZ Data Services has several partnerships with large research international infrastructures, like EPOS, GEOROC, the World Heat Flow Database Project, and provides data publication services to several geodetic data services of the International Association for Geodesy (ICGEM, IGETS, ISG). Our examples clearly delineate the and the roles of each partner and the benefit of the partnership for the overarching task Open Science.

How to cite: Ott, F., Elger, K., Frenzel, S., Brauser, A., and Lorenz, M.: The role of research data repositories for large integrative research infrastructures, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19436, https://doi.org/10.5194/egusphere-egu24-19436, 2024.

X2.38
|
EGU24-18037
|
ESSI2.8
|
ECS
Alexander Brauser, Kirsten Elger, Linda Baldewein, Simone Frenzel, Ulrike Kleeberg, Birgit Heim, Ben Norden, and Mareike Wieczorek

In many scientific disciplines, physical samples represent the origin of research results. They record unique events in history, support new hypotheses, and are often not reproducible. At the same time, samples are essential for reproducing and verifying research results and deriving new results by analysing existing samples with new methodology. Consequently, the inclusion of sample metadata in the digital data curation processes is an important step to provide the full provenance of research results. The largest challenge is the lack of standardisation and the large variety of sample types and individuals involved: Most samples are collected by individual researchers or small groups that may have internal agreements for sample descriptions, but these might only be used for one expedition or within a small community, and rarely reach beyond institutional boundaries.

The International Generic Sample Number (IGSN, www.igsn.org) is a globally unique, resolving, and persistent identifier (PID) for physical samples with a dedicated metadata schema supporting discovery functionality in the internet. IGSNs allow data and publications to be linked directly to the samples from which they originate and provide contextual information about a particular sample on the internet.

The aim of the project FAIR WISH (FAIR Workflows to establish IGSN for Samples in the Helmholtz Association), funded by the Helmholtz Metadata Collaboration (HMC) was to work towards more standardisation of rich sample descriptions. Project outcomes include (i) standardised, rich and discipline-specific IGSN metadata schemes for different physical sample types within the Earth and Environmental sciences (EaE), (ii) workflows to generate machine-readable IGSN metadata from different states of digitisation and (iii) the FAIR Samples Template.

The FAIR SAMPLES Template enables metadata collection and batch upload of samples at various sample hierarchies (parent, children at different hierarchy levels) at once. The ability to fill the FAIR SAMPLES Template by individual researchers or research teams or to create scripts to fill it out directly from databases for a wide range of sample types makes the template flexible with a wide applicability. The structured metadata, captured with the FAIR SAMPLES Template and converted into XML files, already represents an important step for the standardisation of rich sample descriptions and their provision in machine-readable form.

Standardised workflows for metadata documentation and compliance with international metadata standards address the challenges associated with reproducibility of samples and their insufficient documentation. The developments within the FAIR WISH project provide a foundation for a more collaborative and integrated scientific enterprise. Future efforts in this area can build on this framework to further improve the accessibility and interoperability of sample data and advance the collective understanding of Earth's environmental processes.

How to cite: Brauser, A., Elger, K., Baldewein, L., Frenzel, S., Kleeberg, U., Heim, B., Norden, B., and Wieczorek, M.: Towards more standards for sample descriptions: The FAIR WISH project, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18037, https://doi.org/10.5194/egusphere-egu24-18037, 2024.

X2.39
|
EGU24-18182
|
ESSI2.8
Anna-Lena Flügel, Beate Krüss, Heinrich Widmann, Hannes Thiemann, Stephan Kindermann, and Fanny Adloff

Climate Change is one of the most pressing global challenges in which researchers from around the world and from various disciplines are working together. Due to the demands regarding the use of openly accessible data in their own as well as other research domains, providing services embedded in European and international infrastructures has always been crucial for climate model researchers.Therefore the Horizon Europe project FAIRCORE4EOSC established the case study “Climate Change” to demonstrate how researchers, the Earth System Science (ESS) community and wider user communities can benefit from the components developed in FAIRCORE4EOSC.

Within FAIRCORE4EOSC, the German Climate Computing Center (DKRZ) examines the possibility of integrating EOSC (European Open Science Cloud) and IS-ENES (Infrastructure for the European Network for Earth System Modelling) services within the Climate Change case study to address some of the data challenges of the ESS community. For example, a huge data space exists in ENES which cannot be found in EOSC, neither on fine-granular nor coarse-granular level. For some ENES data collections, DataCite DOIs are assigned, but these usually refer to thousands of data objects that need to be grouped into different levels of aggregation for which no PIDs are currently available. Additionally, data still miss context formed by producers, experiments, projects, devices, etc crucial for interdisciplinary re-use as well as metadata crosswalks.

To address these challenges, the Climate Change case study investigates the benefits of four FAIRCORE4EOSC components: RAiD (Research Activity Identifier Service), PIDGraph, DTR (Data Type Registry) and MSCR (Metadata Schema and Crosswalk Registry). The goal is to improve discoverability and reusability of data collections at all levels of granularity, and to link data to experiments and projects. 

In this case study, selected ENES data collections will receive identifiers using Kernel Information Types developed in FAIRCORE4EOSC as well as the DTR contents. The assignment of RAiDs to projects/experiments provides domain agnostic users with an aggregated view of the entities (data, software, people involved, etc.) from data generation by the Earth System modellers up to publication of final assessment reports by IPCC authors. These metadata will be supplied to Open Science Graphs and represented within the PIDGraph that visualises the context and interlinking for a specific research project based on DOIs and RAiDs. In addition to the identifiers, the scientific metadata are also made available. Improving information that enables meaningful crosswalks is important and supported by the features of the DTR and MSCR. The DTR offers the possibility to register and assign a PID to a data type (e.g. measurement unit, info type, schema) and ensures a machine actionable standardisation of PID metadata for data objects. The Climate Change case study will use DTRs for persistent Climate Forecast convention (CF) variable definitions. The MSCR can then be used to create machine actionable unit conversions or variable mappings based on DTR data types. This focus on improving the prerequisites for machine-aided analytics including semantic aspects is of high priority due to the commonly large data volumes and the high interdisciplinary requirements in climate science.

How to cite: Flügel, A.-L., Krüss, B., Widmann, H., Thiemann, H., Kindermann, S., and Adloff, F.: Case study Climate Change : How Earth System Science benefits from FAIRCORE4EOSC components, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18182, https://doi.org/10.5194/egusphere-egu24-18182, 2024.

X2.40
|
EGU24-20307
|
ESSI2.8
Chengbin Wang

Geoscience data associated with mineral resource surveys have become essential digital assets for governments and mining companies. The rapid increase in the volume of geoscience data makes it challenging to acquire knowledge quickly. In this study, we proposed and built a workflow that employs knowledge graph techniques, deep learning, question templates, and matching algorithms to provide a question-answering service for geologists involved in mineral resource surveys. Initially, we utilized deep-learning-based geological entities and their semantic relation reorganization, along with relational data mapping, to construct the mineral resource survey knowledge graph based on the ontology model. We then employed question template matching, a geological entity recognition model, and a sentence transformer to determine the optimal question template and generate a query statement for knowledge acquisition from a knowledge graph based on the Cypher language. Subsequently, we utilized a subgraph and a short abstract to express the results. This study demonstrates the utility of workflows for providing knowledge services in the field of mineral resource surveys. The results also suggest that further studies on geoscience pre-trained models, an informative library of question templates, and multimodal knowledge graphs are necessary to improve the performance of the knowledge graph-driven question-answering system.

How to cite: Wang, C.: A Knowledge Graph-Driven Question Answering System for Mineral Resource Survey , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20307, https://doi.org/10.5194/egusphere-egu24-20307, 2024.

X2.41
|
EGU24-18193
|
ESSI2.8
|
ECS
Leander Kallas, Marthe Klöcking, Lucia Profeta, Stephen Richard, Annika Johansson, Kerstin Lehnert, Manja Luzi-Helbing, Bärbel Sarbas, Hannah Sweets, Dieter Garbe-Schönberg, Matthias Willbold, and Gerhard Wörner

Global compilations of geo- and cosmochemical data are increasingly leveraged to address exciting new research questions through data-analytics and machine-learning approaches. These invaluable datasets are maintained and made accessible as synthesis databases, such as GEOROC and PetDB catering to terrestrial igneous and metamorphic rocks; AstroMat Data Synthesis encompassing diverse astromaterial samples; and GeoReM a comprehensive resource for geochemical, environmental and biological reference materials. The GEOROC and PetDB databases for igneous and metamorphic rocks collectively aggregate data from thousands of publications, combining over 42 million single data values (major and trace elements, stable and radiogenic isotope ratios, radiometric ages) for bulk rock, glass, as well as minerals and their inclusions.

The diverse focus of these data systems include data from different sources and metadata makes data integration and interoperability challenging. The DIGIS and EarthChem projects are working towards designing machine-readable unified vocabularies for their data systems to achieve full interoperability. These vocabularies, associated with primary chemical data as well as geospatial, analytical and sample metadata, encompass many categories describing geographic location, sampling technique, lithology and mineral types, geological and tectonic setting, as well as analytes, analytical methods, reference materials, and more.

Wherever possible, external machine- and/or human-readable external vocabularies from respected authorities are incorporated, such as MinDat’s "Subdivisions of Rock," the International Mineralogical Association’s "List of Minerals" (Warr, 2021), and the International Union of Pure and Applied Chemistry’s chemical terminologies. For remaining categories, a set of local vocabularies are developed by our group (e.g. analytical methods, see Richard et al. 2023). The collaborative effort between DIGIS, EarthChem, and the Astromaterials Data System is leading to an advanced vocabulary ecosystem relating samples, data, and analytical methods in geo- and cosmochemical research that reaches from local- to community-driven and, eventually global connections.

Establishing a globally accepted vocabulary not only contributes to building interoperability between our existing geo-and cosmochemistry synthesis databases, but will also help pave the way toward interoperability with the GeoReM database, linking data with analytical methods and reference materials to provide means for data quality control and assessment of analytical uncertainty.

Finally, the unified vocabularies of EarthChem, GEOROC, and GeoReM will advance the creation of a global network of geochemical data systems as promoted by the OneGeochemistry initiative (Klöcking et al., 2023; Prent et al. 2022), connecting and integrating the broadest range of geoanalytical data generated, for example, in studies of environmental samples, archeological artefacts, or geohealth matters.

We report on these goals, achievements, state of advance, and challenges and seek community engagement and feedback.

 

References

Klöcking, M. et al. (2023). Community recommendations for geochemical data, services and analytical capabilities in the 21st century. In Geochimica et Cosmochimica Acta (Vol. 351, pp. 192–205).

Prent, A. et al. (2023) Innovating and Networking Global Geochemical Data Resources Through OneGeochemistry. Elements 19, Issue 3, pp. 136–137.

Richard, S. et al. (2023) Analytical Methods for Geochemistry and Cosmochemistry. Concept Scheme for Analysis Methods in Geo- and Cosmochemistry. Research Vocabularies Australia.

Warr, L. N. (2021). IMA–CNMNC approved mineral symbols. Mineralogical Magazine, 85(3), 291-320.

How to cite: Kallas, L., Klöcking, M., Profeta, L., Richard, S., Johansson, A., Lehnert, K., Luzi-Helbing, M., Sarbas, B., Sweets, H., Garbe-Schönberg, D., Willbold, M., and Wörner, G.: Unified Vocabularies for Geo- and Cosmochemical Data Systems, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18193, https://doi.org/10.5194/egusphere-egu24-18193, 2024.

X2.42
|
EGU24-18751
|
ESSI2.8
Andrea Lammert, Claudia Martens, and Aenne Loehden

The exponential growth of data due to technological developments along with an increased recognition of research data as relevant research output during the last decades substantiates fundamental challenges in terms of interoperability, reproducibility and reuse of scientific information. Being cross-disciplinary at its core, research in Earth System Science comprises divergent domains such as Paleontology, Marine Science, Atmospheric Sciences and Molecular Biology in addition to different types of data such as observation and simulation data. Within the various disciplines, distinct methods and terms for indexing, cataloguing, describing and finding scientific data have been developed, resulting in several controlled Vocabularies, Taxonomies and Thesauri. However, given the semantic heterogeneity across scientific domains, effective utilisation and (re)use of data is impeded while the importance of enhanced and improved interoperability across research areas will increase even further, considering the global impact of Climate Change to literally all aspects of everyday life. There is thus a clear need to harmonise practices around the development and usage of semantics in representing and describing information and knowledge.

Using Ontologies (as a formal mechanism for defining terms and their relations) can help to address this issue, especially with regard to discovery, comprehension and metadata enrichment. If used and maintained, Ontologies also encourage metadata standardisation, idealiter across Disciplines. Examples for enhanced search options include (but are not limited to): term relations for variables as well as for topics and locations; Synonyms and Homonyms; autocomplete function for search terms; support of multiple languages. Indexing of research data can be improved using Ontologies e.g. by proposing terms for variable names or measurement units. Depending on their richness, ontologies ease e.g. finding, comprehension, processing, and reuse, both for human users as well as for automatic reasoning and processing.

Ontologies can represent different levels of granularity, connecting domain specific Vocabularies as e.g. Climate Forecast conventions with generic Taxonomies for e.g. Scientific Disciplines or Funding Policies, thus extending the reach of scientific data to other user groups such as Journalists, Politicians or Citizens.

For a beneficial usage of semantic artefacts, sustainability is the key: any kind of terminology service must be maintained to guarantee that terms and relations are offered in a persistent way. But if they are, Vocabularies, Taxonomies, Thesauri and Ontologies can serve as a driving force for improved visibility and findability of research output within and across different research areas. Why Ontologies matter, what they are, and how they can be used will be depicted on our Poster in an easy-to-understand way.

How to cite: Lammert, A., Martens, C., and Loehden, A.: Benefits of Ontologies in Earth System Science, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18751, https://doi.org/10.5194/egusphere-egu24-18751, 2024.

X2.43
|
EGU24-20238
|
ESSI2.8
|
ECS
Christin Henzen, Auriol Degbelo, Jonas Grieb, Robin Heß, Ralf Klammer, Roland Koppe, Christof Lorenz, and Claudia Müller

Research data in the Earth System Sciences (ESS) are managed in diverse repositories with varying aims, publishing, and curation approaches, as well as technical solutions. The resulting heterogeneity often hampers implementing interoperability and harvesting concepts. From the researchers' perspective on integrative data-driven research questions across repository borders, this leads to ineffective search and reuse of the data. We consider it is vital to train researchers to provide high-quality FAIR data and metadata. However, it is even more important to enable repository providers to act as multipliers, as this enables them to provide researchers with suitable repository solutions. This can be done, for example, by implementing fit-for-purpose metadata schemas and interfaces.

In Germany, several initiatives serve as umbrellas for joint activities with ESS repository providers. In collaboration of the German national data infrastructure for Earth System Sciences (NFDI4Earth) and the Helmholtz Metadata Collaboration (HMC), we have developed a roadmap that enables repository providers to meet the needs of researchers and technical requirements. 

As an initial step, we developed recommendations in a community-driven process across NFDI4Earth and HMC. These recommendations provide common steps to foster interoperability, particularly with regard to search and harvesting. Moreover, we have identified a first set of use cases for specific types of ESS data that complement the developed recommendations, e.g. underway measurements of seawater temperature. Through regular updates in the form of community consultations and workshops, we will identify further community needs, as well as support updates and developments of metadata standards, e.g. the implementation of underway measurements in GeoDCAT. In this contribution, we will describe our recommendations, use cases, and lessons learned from the activities for a community-driven process to enable repository providers.

How to cite: Henzen, C., Degbelo, A., Grieb, J., Heß, R., Klammer, R., Koppe, R., Lorenz, C., and Müller, C.: When Metadata crosses Borders - Enabling Repository Providers with Joint Forces in Earth System Sciences, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20238, https://doi.org/10.5194/egusphere-egu24-20238, 2024.