Session ESSI3.3

ESSI3.3 | Collaborative Science Through Free and Open Source Software Tools and Frameworks in Earth Sciences

PICO

Fri, 10:45

PICO

Collaborative Science Through Free and Open Source Software Tools and Frameworks in Earth Sciences

Co-organized by GI1, co-sponsored by SSI and AGU

Convener: Kaylin Bugbee | Co-conveners: Christopher KadowECSECS, Ionut Cosmin Sandric, Paul Kucera, George P. Petropoulos

PICO

| Fri, 19 Apr, 10:45–12:30 (CEST)

PICO spot 4

PICO: Fri, 19 Apr | PICO spot 4

Chairpersons: Kaylin Bugbee, Christopher Kadow, Ionut Cosmin Sandric

10:45–10:47

PICO4.1

EGU24-271

On-site presentation

Open Science Collaboration across Eath Observation Platforms

Ingo Simonis, Marie-Francoise Voidrot, Rachel Opitz, and Piotr Zaborowski

Collaborative Open Science is essential to addressing complex challenges whose solutions prioritize integrity and require cross-domain integrations. Today, building workflows, processes, and data flows across domains and sectors remains technically difficult and practically resource intensive, creating barriers to whole-systems change. While organizations increasingly aim to demonstrate accountability, they often lack the tools to take action effectively. By making it simple to connect data and platforms together in transparent, reusable and reproducible workflows, the OGC Open Science Persistent Demonstrator (OSPD) aims to enable responsible innovation through collaborative open science. The OSPD focuses specifically on using geospatial and earth observation (EO) data to enable and demonstrate solutions that create capacity for novel research and accelerate the practical implementation of this research.

Collaborative Open Science and FAIR (Findable, Accessible, Interoperable, Reusable) data are widely recognized as critical tools for taking advantage of the opportunities created through addressing complex social and environmental challenges. To date, many millions have been invested in hundreds of initiatives to enable access to analytical tools, provide data management, data integration and exchange, translate research results, and support reproduction and testing of workflows for new applications. These investments have resulted in a plethora of new data, protocols, tools and workflows, but these resources frequently remain siloed, difficult to use, and poorly understood, and as a result they are falling short of their full potential for wider impact and their long term value is limited.

This presentation will illustrate how the OGC OSPD Initiative, through its design, development and testing activities, provides answers to leading questions such as:

How can we design Open Science workflows that enable integration across platforms designed for diverse applications used in different domains to increase their value?
How can we lower barriers for end users (decision makers, managers in industry, scientists, community groups) who need to create Open Science workflows, processes, and data flows across domains and sectors remains technically difficult and practically resource intensive, creating?
How can Open Science workflows and platforms enable collaboration between stakeholders in different domains and sectors?
How can we empower organizations to demonstrate accountability in their analytical workflows, data, and representations of information through Open Science?
What Open Science tools do organizations need to take action effectively?
How can Open Science and FAIR data standards practically support accountability?
How can we make it simple to connect data and platforms together in transparent, reusable and reproducible (FAIR) workflows?
What are the specific challenges of using geospatial, earth observation (EO), and complementary data in this context?

How to cite: Simonis, I., Voidrot, M.-F., Opitz, R., and Zaborowski, P.: Open Science Collaboration across Eath Observation Platforms, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-271, https://doi.org/10.5194/egusphere-egu24-271, 2024.

10:47–10:49

EGU24-1445

ECS

Virtual presentation

A Software Toolkit for Advancing our Understanding of Land Surface Interactions: Recent developments to the SimSphere SVAT model

Christina Lekka, George P. Petropoulos, Vasileios Anagnostopoulos, Spyridon E. Detsikas, Petros Katsafados, and Efthimios Karympalis

Mathematical models are widely used today to study the intricate physical processes and interactions among the different components of the Earth’s system. Such models are often used synergistically with Earth Observation (EO) data allowing to derive spatiotemporal estimates of key parameters characterising land surface interactions. This synergy allows combining the horizontal coverage and spectral resolution of EO data with the vertical coverage and fine temporal continuity of those models. SimSphere is a mathematical model belonging to the Soil Vegetation Atmosphere Transfer (SVAT) models. As a software toolkit, it has been developed in Java and it is used either as a stand-alone application or synergistically with EO data. The model use is constantly expanding worldwide both as an educational and as a research tool. Herein we present recent advancements introduced to SimSphere. We have comprehensively tested and updated the model code and added new functionalities which are illustrated herein using a variety of case studies. For example, it presents herein the new functionality that allows it to be applied over complex/heterogeneous landscapes, and this new model capability is demonstrated in experimental settings in various European ecosystems. The present study contributes towards efforts ongoing nowadays by the users' community of the model and is also very timely, given the increasing interest in SimSphere particularly towards the development of EO-based operational products characterising the Earth’s water cycle. The research presented herein has been conducted in the framework of the project LISTEN-EO (DeveLoping new awareness and Innovative toolS to support efficient waTer rEsources man-agement Exploiting geoinformatiOn technologies), funded by the Hellenic Foundation for Research and Innovation programme (ID 015898).

Keywords: SVAT, SimSphere, Earth Observation, land surface interactions, LISTEN-EO

How to cite: Lekka, C., Petropoulos, G. P., Anagnostopoulos, V., Detsikas, S. E., Katsafados, P., and Karympalis, E.: A Software Toolkit for Advancing our Understanding of Land Surface Interactions: Recent developments to the SimSphere SVAT model, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-1445, https://doi.org/10.5194/egusphere-egu24-1445, 2024.

10:49–10:51

PICO4.2

EGU24-1707

On-site presentation

Topic Analysis and Classification of EGU Conference Abstracts

Jens Klump, Chau Nguyen, John Hille, and Michael Stewart

The corpus of Abstracts from the EGU General Assemblies 2000 - 2023 covers a wide range of Earth, planetary and space sciences topics, each with multiple subtopics. The abstracts are all in English, fairly uniform in length, cover one broad subject area, and are licenced under a permissive licence that allows further processing (CC BY 4.0), making this a high-quality text corpus for studies using natural language processing (NLP) and for the finetuning of Large Language Models (LLM). Our study makes use of openly available NLP software libraries and LLMs.

In the first phase of this study, we were interested in finding out how well abstracts map to the topics covered by EGU Divisions and whether co-organisation of sessions contributes to or dilutes topics. The abstracts are available only in unstructured formats such as Portable Document Format (PDF) or plain text in XML extracts from the conference database. They are identified by abstract numbers but carry no information on the session or division where they were originally presented. We reconstructed this information from the online conference programme.

To be able to employ a supervised learning approach of matching abstracts to topics, we defined the topics to be synonymous with the 23 scientific divisions of the EGU, using the division and co-listed divisions as topic labels.

We finetuned the Bidirectional Encoder Representations from Transformers (BERT) and the slightly simplified DistillBERT language models for our topic modelling exercise. We also compared the machine classifications against a random association of abstracts and topics. Preliminary results obtained from our experiments show that using a machine learning model performs well in classifying the conference abstracts (accuracy = 0.66). The accuracy varies between divisions (0.40 for NP to 0.96 for G) and improves when taking co-organisation between divisions into account. Starting from one year of abstracts (EGU 2015), we plan to expand our analysis to cover all abstracts from all EGU General Assemblies (EGU 2000 - 2024).

How to cite: Klump, J., Nguyen, C., Hille, J., and Stewart, M.: Topic Analysis and Classification of EGU Conference Abstracts, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-1707, https://doi.org/10.5194/egusphere-egu24-1707, 2024.

10:51–10:53

PICO4.3

EGU24-5658

ECS

On-site presentation

TropiDash: a comprehensive open-source dashboard for Tropical Cyclone data visualization and analysis

Laura Paredes-Fortuny, Filippo Dainelli, and Paolo Colombo

Tropical Cyclones (TCs) are synoptic-scale storm systems rapidly rotating around a center of low atmospheric pressure which primarily derive their energy from exchanges of heat and moisture between the air and sea. These cyclones are among the most impactful geophysical phenomena, inflicting substantial economic damages and numerous fatalities. Key hazards associated with TCs include intense winds, extreme rainfall, and storm surges, which frequently result in extensive coastal flooding. Because of the severe consequences of their impacts, precise monitoring of these events and effective preparation for their occurrences are crucial to ensure the safety and resilience of populations and infrastructure.

For successful monitoring and preparation, the access to relevant factors associated with TC forecasts, such as risk projections and impact variables, must be adequate and user-friendly, enabling users to rapidly locate and comprehend the information they seek. To achieve this objective, visual tools and dashboards that concentrate interdisciplinary information and data from diverse sources serve as powerful summarization methods. Summary dashboards and tools facilitate easy access to information for all users ranging from experts and policymakers to common citizens. They consist of a platform offering a comprehensive overview of the situation, supporting informed decision-making. Current open-source tools for consulting TC data have limitations. They tend to be highly specialized, offering a limited selection of maps or graphs that cover only a portion of TC-related information. They also often lack interactivity, which restricts the user experience and the search for specific information. Furthermore, these tools can be complex to use due to inadequate documentation or challenges in presenting multiple pieces of information concurrently.

In this work, we introduce a novel free open-source dashboard designed to surpass the limitations of existing tools, displaying a comprehensive set of information regarding TC hazards. TropiDash presents several strengths that enhance user experience and accessibility. Developed in the widely recognized Jupyter Notebook programming environment, it is easily accessible either through the installation guide on its GitHub repository or by initiating its Binder environment. The dashboard features a user-friendly interface utilizing Python widgets and the Voilà protocol. It aggregates data from various sources spanning multiple domains: from cyclone properties, such as track forecasts and strike probability maps, to atmospheric variable fields (wind speed and direction, temperature, precipitation), to risk and vulnerability information, such as cyclone risk, coastal flood risk, population density. All this is made available while providing the user with a wide range of interactivity, from choosing the cyclone to selecting the variables of their interest to roam over the interactive maps.

The first version of TropiDash was realized in the context of Code for Earth 2023, a program for the development of open-source software organized by the European Centre for Medium-Range Weather Forecasts. Here we present an improved and optimized version.

How to cite: Paredes-Fortuny, L., Dainelli, F., and Colombo, P.: TropiDash: a comprehensive open-source dashboard for Tropical Cyclone data visualization and analysis, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-5658, https://doi.org/10.5194/egusphere-egu24-5658, 2024.

10:53–10:55

EGU24-6254

Virtual presentation

Open geospatial research software in 2024: Assessing service quality with technology readiness levels

Peter Löwe

To initiate, maintain and accelerate behavioral change towards Open and FAIR practices, tangible benefits for scientific communities and especially early career scientists are a critical key success factor. The realization of such benefits, by due credit, funding, or other means requires underlying workflows, enabled by underlying infrastructures and standards, which are operational, reliable and trusted. Many education efforts are under way to educate and motivate researchers how to embrace and particpate in Open and FAIR efforts, including the open geospatial community software projects of the OSGeo foundation. Still, from the perspective of a developer of research software, the current general service quality of offerings for PID-/citation-based credit remains limited, fickle, partially unpredictable and frustrating. This presentation demonstrates these challenges by real world examples from OSGeo open geospatial projects, such as QGIS, GRASS GIS and proj and resulting PID-references in publications. Further, a service centered approach is introduced to enable both end users and Open/FAIR communities to assess the overall service quality through Technological Readiness Levels (TRL), to improve the user experience by building trust and to focus further development ressources.

How to cite: Löwe, P.: Open geospatial research software in 2024: Assessing service quality with technology readiness levels , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-6254, https://doi.org/10.5194/egusphere-egu24-6254, 2024.

10:55–10:57

PICO4.5

EGU24-8146

ECS

On-site presentation

Xarray-regrid: regridding with ease

Bart Schilperoort, Claire Donnelly, Yang Liu, and Gerbrand Koren

In geosciences different sources of data are often on different grids. These can be at different resolutions, but also have the grid centers at different locations. To be able to use these different sources of data in models or analyses, they have to be re-projected to a common grid. Popular tools for this are the command-line tool ‘Climate Data Operators’ (CDO) and the Earth System Modeling Framework (ESMF).

These tools work well but have some downsides: CDO is a command-line tool and as such the regridded data has to be written to disk. ESMPy, the Python package for ESMF, is only available on Linux and Mac OSX, and does not support out-of-core computing. Both tools rely on binary dependencies, which can make them more difficult to install. Additionally, many geoscientists already use xarray for analyzing and processing (netCDF) data.

For this use case we developed xarray-regrid, a lightweight xarray plugin which can regrid (rectilinear) data using the linear, nearest-neighbor, cubic, and conservative methods. The code is open source and modularly designed to facilitate the addition of alternative methods. Xarray-regrid is fully implemented in Python and therefore can be used on any platform. Using Dask, the computation is fully parallelized and can be performed out-of-core. This allows for fast processing of large datasets without running out of memory.

Xarray-regrid is available on the Python Package Index (pip install xarray-regrid), and its source code is available on GitHub at https://github.com/EXCITED-CO2/xarray-regrid

How to cite: Schilperoort, B., Donnelly, C., Liu, Y., and Koren, G.: Xarray-regrid: regridding with ease, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-8146, https://doi.org/10.5194/egusphere-egu24-8146, 2024.

10:57–10:59

PICO4.6

EGU24-12004

On-site presentation

NASA’s Open Science Platform VEDA (Visualization, Exploration and Data Analytics)

Zac Deziel, Aimee Barciauskas, Jonas Solvsteen, Manil Maskey, Brian Freitag, Slesa Adhikari, Anthony Boyd, Alexandra Kirk, David Bitner, and Vincent Sarago

VEDA is an open-source science cyberinfrastructure for data processing, visualization, exploration, and geographic information systems (GIS) capabilities (https://www.earthdata.nasa.gov/esds/veda, https://www.earthdata.nasa.gov/dashboard/). NASA has always had open data policies, so data has always been openly accessible for anyone, but NASA hasn’t constantly exposed it in friendly interfaces or analytics platforms. VEDA attempts to make NASA’s Earth data mean more

As VEDA supplies data and computing services through its dashboard and JupyterHub applications and engages with communities such as EGU, it is a critical component of NASA’s open science initiative. VEDA’s adoption of existing and emerging standards such as STAC, Cloud-Optimized GeoTIFFs, Zarr, the Features API, and the Tiles API ensures interoperability and reusability.

In the past year, VEDA has expanded its impact in 3 ways: (1) the reuse of its infrastructure to stand up the multi-agency Greenhouse Gas Center (https://earth.gov/ghgcenter, announced at COP28) and NASA’s Earth Information Center (https://earth.gov/), (2) the reuse of data APIs across applications, such as VEDA data in NASA’s Enterprise GIS, and (3) the generalization of the data system architecture into a free and open source framework called eoAPI.

VEDA has also maintained and deepened its connections to the Multi-Mission Algorithm and Analysis Platform (MAAP). MAAP is a research data infrastructure (RDI) for above-ground biomass estimation. MAAP is reusing and contributing to the eoAPI data system and plans to integrate the analytics components (JupyterHub and data processing system) further.

Now that VEDA has manifested GHG Center and EIC, VEDA is a project where innovation happens. The VEDA team, composed of NASA project leads, scientists, designers, and developers, constantly works to resolve old and new challenges in managing EO architectures. For example, the team designs and implements interfaces to manage STAC metadata. eoAPI is a result of this innovative environment.

eoAPI is a new, open-source, installable combination of data catalog and associated services for earth observation and related data with a cloud-computing infrastructure first approach. eoAPI combines STAC data ingestion, data hosting (pgSTAC), and querying services (stac-fastapi) with raster (Titiler) and vector services (TiPg). eoAPI is built for reuse and has been used beyond VEDA, GHG, and EIC to deliver MS Planetary Computer and AWS ASDI’s data catalog and applications for the International Federation of the Red Cross and MercyCorps.

This presentation will demo the current capabilities of eoAPI and VEDA and discuss how these capabilities were designed and architected with the central goals of science delivery, reproducible science, and interoperability to support the re-use of data and APIs across the Earth Science ecosystem of tools. The presentation will close with VEDA and eoAPI’s plans.

How to cite: Deziel, Z., Barciauskas, A., Solvsteen, J., Maskey, M., Freitag, B., Adhikari, S., Boyd, A., Kirk, A., Bitner, D., and Sarago, V.: NASA’s Open Science Platform VEDA (Visualization, Exploration and Data Analytics), EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12004, https://doi.org/10.5194/egusphere-egu24-12004, 2024.

10:59–11:01

PICO4.7

EGU24-12024

On-site presentation

Generic geodesy in the browser? Recent developments in Rust Geodesy

Thomas Knudsen

Rust Geodesy (RG) is an open source library, written in Rust [1], for experiments with geodetic transformations, software, and standards [2], [3]. RG originated from attempts to demonstrate architectural innovations for potential improvement of the ubiquitous transformation system PROJ, with which it consequentially shares many characteristics.

In parallel, however, RG has also evolved into a quite capable geodetic tool in its own right. And over the last few releases it has expanded from the "geometrical geodesy" background of PROJ, into supporting a number of operations from the realm of physical geodesy (deflections of the vertical, normal gravity models, gravity reduction, etc.), while still maintaining the key architectural feature of run time construction of complex operations from pipelines of simpler operators.

But in particular, the RG design has been nudged towards supporting the development and maintenance of geodetic transformations, as reflected by these characteristics:

A clear and compact syntax for specification of processing pipelines
...but also syntactical backward compatibility and interoperability, through additional support for PROJ's older and more verbose syntax
Extensibility through straightforward, tight integration betwwen system supplied and user written operators
..but also support for loose integration with arbitrary ancillary software, through support of plain text operator definitions and grid files
...and ad-hoc abstractions through support for run-time defined user macros
Seamless interoperability with arbitrarily complex application program data structures, i.e. integrating with the user program, rather than forcing the use of library provided data structures, and
Support of roundtrip consistency checks

The RG data flow architecture is based on the foundational concept of "coordinate sets" from the OGC/ISO geospatial standards series [4]. Hence, in contrast to PROJ operators, which operate on a single coordinate tuple, RG operators operate on an entire set of coordinate tuples at a time. While this may seem immaterial at the source code level, it gives the compiler a wider context for introducing vectorisation, leveraging the SIMD instruction sets of modern computers to transform more than one coordinate tuple at a time.

Recently, SIMD-support has also arrived in the Web Assembly (Wasm) implementations of the major web platforms [5], and when compiled to Wasm, RG has shown to be a compact, lightweight and practical library for use on the web [6], [7]. So with RG's combined forays into the realms of Wasm and physical geodesy, the vista of "generic geodesy in the browser" is now more than just a mirage.

[1] Steve Klabnik and Carol Nichols, 2022: The Rust Programming Language, 2nd edition, 560 pp., San Francisco, CA, USA: No Starch Press

[2] Thomas Knudsen, 2021: Ruminations on Rust Geodesy: Overall architecture and philosophy.

URL: https://github.com/busstoptaktik/geodesy/blob/main/ruminations/000-rumination.md

[3] Thomas Knudsen: Geodesy. URL https://github.com/busstoptaktik/geodesy

[4] Roger Lott (ed), 2019: OGC Abstract Specification Topic 2: Referencing by coordinates.

URL https://docs.ogc.org/as/18-005r4/18-005r4.html

[5] WebAssembly Feature Extensions. URL: https://webassembly.org/features/

[6] Kyle Barron, 2023: Prototyping GeoRust + GeoArrow in WebAssembly. Efficient, vectorized geospatial operations in the browser,

URL https://observablehq.com/@kylebarron/prototyping-georust-geoarrow-in-webassembly

[7] Sean Rennie, 2023: Testing geodesy-wasm,

URL https://observablehq.com/d/3ff9d9b8f0b5168a

How to cite: Knudsen, T.: Generic geodesy in the browser? Recent developments in Rust Geodesy, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12024, https://doi.org/10.5194/egusphere-egu24-12024, 2024.

11:01–11:03

PICO4.8

EGU24-12786

ECS

On-site presentation

Quantifying water security using hyperresolution hydrological modelling on top of an Open Data Cube (ODC)

Luis Felipe Patino Velasquez, Dr. Elizabeth Lewis, Prof. Jon Mills, and Dr. Stephen Birkinshaw

For many areas across the globe physically-based hydrological models have a fundamental role helping devise a comprehensive and robust plan for future climate change adaption and preparedness informing water management and flood initiatives. Now that the advances in satellite and sensor technology coupled with the development of cloud computing have enable the advancement of hydrology as a data-intensive science, there is a considerable impetus and interest in future research and approaches in the use of these emerging technologies to develop new insights that contribute to fundamental aspects of the hydrological sciences. Whilst increasing volumes of Earth Observation (EO) data couple with advances in cloud computing have enable the enhancement of hydrological modelling, one of the remaining challenges is ensuring a seamless data pipeline to the final hydrological prediction. As a result, this poses a significant set of questions in the use of EO data for hydrology. The current research is situated at the junction of three areas: hydrological physical modelling, satellite EO data and the implementation of the Earth Observation Data Cube (EODC) paradigma. This presentation will outline the development and use of a open source modelling workflow integrating analysis ready data (ARD) through the implementation of the Open Data Cube (ODC) data exploitation architecture with a physically-based, spatially-distributed hydrological model (SHETRAN), as glimpse into the relevance of EO data cube solutions in lowering the technology and EO data barriers. Thus, enabling users to harnes existent open source EO datasets and software at minimum cost and effort with the objective to enable a more open and reproducible hydrological science.

How to cite: Patino Velasquez, L. F., Lewis, Dr. E., Mills, P. J., and Birkinshaw, Dr. S.: Quantifying water security using hyperresolution hydrological modelling on top of an Open Data Cube (ODC), EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12786, https://doi.org/10.5194/egusphere-egu24-12786, 2024.

11:03–11:05

PICO4.9

EGU24-14025

ECS

On-site presentation

Achieve Vector Data Cube by Apache Parquet Partition: Building an Analysis-Ready Global Lidar Data (GEDI and ICESat-2) for Earth System Science applications

(withdrawn)

Yu-Feng Ho, Rolf Simoes, Leandro Parente, and Tomislav Hengl

11:05–11:07

PICO4.10

EGU24-15972

On-site presentation

The LUE open source software for building numerical simulation models on HPC

Derek Karssenberg, Oliver Schmitz, and Kor de Jong

When developing large-scale numerical earth system models traditionally knowledge of a broad range of programming technologies is required to support hardware from laptops up to supercomputers. A knowledge that scientists specialized in a particular geosciences domain mostly do not have and often do not want to acquire. Their emphasis is on describing and implementing the processes rather than for instance dealing with parallelization of model equations. Moreover, when model characteristics or domain extents change their chosen parallelisation technique may already be obsolete or require significant refactoring to adapt to the new situation. We develop the open-source LUE modelling framework, a software environment allowing domain scientists – who may not be familiar with the development of high-performance applications – to develop numerical simulation models that seamlessly scale when adding additional hardware resources. LUE comprises of a data model for the storage of field-based and agent-based data, and provides a broad range of map algebra operations as building blocks for model construction. Each spatial operation is implemented in C++ using HPX, a library and runtime environment providing asynchronous execution of interdependent tasks on both shared-memory and distributed computing systems. LUE provides a Python module and therefore a single high-level API whether models are run on laptops or HPC systems. In our presentation we demonstrate two capabilities of LUE. First, using the built-in operations we implemented a spatially distributed hydrological model including surface water routing. The model runs for the African continent at 100 metres spatial and hourly temporal resolution. Secondly, to demonstrate the extensibility we utilise LUE’s focal operation framework to implement an individual kernel calculating greenness visibility exposure. Our PICO presentation will also include future extensions of the framework in particular for agent-based modelling and integration of machine learning model components.

How to cite: Karssenberg, D., Schmitz, O., and de Jong, K.: The LUE open source software for building numerical simulation models on HPC, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-15972, https://doi.org/10.5194/egusphere-egu24-15972, 2024.

11:07–11:09

PICO4.11

EGU24-19155

On-site presentation

FAIR Environmental Data through a STAC-Driven Inter-Institutional Data Catalog Infrastructure – Status quo of the Cat4KIT-project

Mostafa Hadizadeh, Christof Lorenz, Sabine Barthlott, Romy Fösig, Katharina Loewe, Corinna Rebmann, Benjamin Ertl, Robert Ulrich, and Felix Bach

In the rapidly advancing domain of environmental research, the deployment of a comprehensive, state-of-the-art Research Data Management (RDM) framework is increasingly pivotal. Such a framework is key to ensure FAIR data, laying the groundwork for transparent and reproducible earth system sciences.

Today, datasets associated with research articles are commonly published via prominent data repositories like Pangaea or Zenodo. Conversely, data used in actual day-to-day research and inter-institutional projects tends to be shared through basic cloud storage solutions or, even worse, via email. This practice, however, often conflicts with the FAIR principles, as much of this data ends up in private, restricted systems and local storage, limiting its broader accessibility and use.

In response to this challenge, our research project Cat4KIT aims to establish a cross-institutional catalog and Research Data Management framework. The Cat4KIT framework is, hence, an important building block towards the FAIRification of environmental data. It not only streamlines the process of ensuring availability and accessibility of large-scale environmental datasets but also significantly enhances their value for interdisciplinary research and informed decision-making in environmental policy.

The Cat4KIT system comprises four essential elements: data service provision, meta(data) harvesting, catalogue service, and user-friendly data presentation. The data service provision module is tailored to facilitate access to data within typical storage systems by using well-defined and standardized community interfaces via tools like the Thredds data server, Intake Catalogues, and the OGC SensorThings API. By this, we ensure seamless data retrieval and management for typical use-casers in environmental sciences.

(Meta)data harvesting via our so-called DS2STAC-package entails collecting metadata from various data services, followed by creating STAC-metadata and integrating it into our STAC-API-based catalog service.

This catalog service module synergizes diverse datasets into a cohesive, searchable spatial catalog, enhancing data discoverability and utility via our Cat4KIT UI.

Finally, our framework's data portal is tailored to elevate data accessibility and comprehensibility for a wide audience, including researchers, enabling them to efficiently search, filter, and navigate through data from decentralized research data infrastructures.

One notable characteristic of Cat4KIT is its dependence on open-source solutions and strict adherence to community standards. This guarantees not just the framework's ability to function well with current data systems but also its simple adaption and expansion to meet future needs. Our presentation demonstrates the technical structure of Cat4KIT, examining the development and integration of each module to adhere to the FAIR principles. Additionally, it showcases examples to illustrate the practical use of the framework in real-life situations, emphasizing its efficacy in enhancing data management practices within KIT and its potential relevance in other research organizations.

How to cite: Hadizadeh, M., Lorenz, C., Barthlott, S., Fösig, R., Loewe, K., Rebmann, C., Ertl, B., Ulrich, R., and Bach, F.: FAIR Environmental Data through a STAC-Driven Inter-Institutional Data Catalog Infrastructure – Status quo of the Cat4KIT-project, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19155, https://doi.org/10.5194/egusphere-egu24-19155, 2024.

11:09–11:11

PICO4.12

EGU24-19421

On-site presentation

Implementing National Copernicus services for hydrology and natural hazard monitoring at NVE using Open Source tools Apache Airflow and actinia

Stefan Blumentrath, Yngve Are Antonsen, Aron Widforss, Niklas Fossli Gjersø, Rune Verpe Engeset, and Solveig Havstad Winsvold

The Norwegian Water Resources and Energy Directorate (NVE) is tasked with management of water- and energy resources in Norway, as well as reducing the risk of damages associated with landslides and flooding. Copernicus satellite data can provide valuable insight for those tasks.

The vast amount of Copernicus data however requires scalable and robust solutions for processing. Standardized and modular workflows help safeguarding maintainability and efficiency of service delivery. In order to implement operational Copernicus services at NVE, the Open Source OSGeo Community project actinia was introduced together with the Open Source Apache Airflow software as a platform for delivering operational Copernicus services at national scale.

actinia (https://actinia-org.github.io/) is a REST API for scalable, distributed, and high performance processing of time series of satellite images, as well as geographical raster and vector data. It is a modular system that uses mainly GRASS GIS for computational tasks.

Apache Airflow (https://airflow.apache.org/) is an orchestration solution that allows to programmatically author, schedule and monitor workflows.

In the presentation, we will illustrate how Apache Airflow and actinia work together and present selected examples of current and future applications operationalized on the platform. Those applications cover currently:

Avalanches
Flooding
snow cover
lake ice

More services related to NVE`s area of responsibility are being investigated, like landslides, slush flows, glacier lake outburst floods, and specific land cover changes...

Finally, we discuss challenges and opportunities of using Open Source Software tools and collaborative science approaches at NVE in national, operational services.

How to cite: Blumentrath, S., Are Antonsen, Y., Widforss, A., Fossli Gjersø, N., Verpe Engeset, R., and Havstad Winsvold, S.: Implementing National Copernicus services for hydrology and natural hazard monitoring at NVE using Open Source tools Apache Airflow and actinia, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19421, https://doi.org/10.5194/egusphere-egu24-19421, 2024.

11:11–11:13

PICO4.13

EGU24-20842

On-site presentation

Democratizing BIM Data Access in Digital Twins Through OGC I3S 3D Streaming Standard

Tamrat Belayneh

Geospatial users have long been constructing immersive 3D environments for diverse applications such as urban planning, environmental and geological studies, 3D analysis, and more recently, by replicating the physical world as a digital twin. In this pico presentation, we aim to illustrate the dynamic evolution of Indexed 3D Scene Layers (I3S), an OGC Community Standard designed for an efficient streaming and storage of substantial geospatial content. I3S has rapidly adapted to encompass new use cases and techniques, pushing the boundaries of geospatial visualization and analysis.

I3S facilitates the efficient transmission of diverse 3D geospatial data types, ranging from discrete 3D objects with attributes and integrated surface meshes to extensive point cloud data covering expansive geographic regions. Moreover, it excels in streaming highly detailed Building Information Model (BIM) content to web browsers, mobile applications, and desktop platforms.

The most recent enhancement to OGC's I3S streaming standard, Building Scene Layer (BSL), introduces a sophisticated framework for effective tiling of massive BIM content. BSL leverages Bounding Volume Hierarchy (BVH) and geometric error driven selection and display criteria, incorporates attribute-driven filtering, and employs various graphics optimizations. These advancements collectively enable the seamless streaming of otherwise voluminous Building Information Model (BIM) 3D assets.

During this session, we will spotlight the practical implementation of I3S BSL across diverse ecosystems, including loaders.gl and CesiumJS. This flexibility empowers users to select their preferred front-end application based on specific requirements and preferences.

How to cite: Belayneh, T.: Democratizing BIM Data Access in Digital Twins Through OGC I3S 3D Streaming Standard, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20842, https://doi.org/10.5194/egusphere-egu24-20842, 2024.

11:13–11:15

EGU24-20137

Virtual presentation

ALAMEDA – A scalable multi-domain metadata management platform

Felix Mühlbauer, Martin Hammitzsch, Marc Hanisch, Gunnar Pruß, Rainer Häner, and Oliver Rach

Modern Earth sciences produce a continuous increasing amount of data. These data consist of the measurements/observations and descriptive information (metadata) and include semantic classifications (semantics). Depending on the geoscientific parameter, metadata are stored in a variety of different databases, standards and semantics, which is obstructive for interoperability in terms of limited data access and exchange, searchability and comparability. Examples of common data types with very different structure and metadata needs are maps, geochemical data derived from field samples, or time series data measured with a sensor at a point, such as precipitation or soil moisture.

So far, there is a large gap between the capabilities of databases to capture metadata and their practical use. ALAMEDA is designed as modular structured metadata management platform for curation, compilation, administration, visualization, storage and sharing of meta information of lab-, field- and modelling datasets. As a pilot application for stable isotope and soil moisture data ALAMEDA will enable to search, access and compare meta information across organization-, system- and domain boundaries.

ALAMEDA covers 5 major categories: observation & measurements, sample & data history, sensor & devices, methods & processing, environmental characteristics (spatio & temporal). These categories are hierarchically structured, interlinkable and filled with specific metadata attributes (e.g. name, data, location, methods for sample preparation, measuring and data processing, etc.). For the pilot, all meta information will be provided by existing and wellestablished data management tools (e.g. mDIS, SMS, LI2, etc.).

In ALAMEDA, all information is brought together and will be available via web interfaces. Furthermore, the project focuses on features such as metadata curation with intuitive graphical user interfaces, the adoption of well-established standards, the use of domain-controlled vocabularies and the provision of interfaces for a standards-based dissemination of aggregated information. Finally, ALAMEDA should be integrated into the DataHub (Hub-Terra).

Currently the project is in the final phase and we want to present the developed concepts and software and lessions learned.

How to cite: Mühlbauer, F., Hammitzsch, M., Hanisch, M., Pruß, G., Häner, R., and Rach, O.: ALAMEDA – A scalable multi-domain metadata management platform, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20137, https://doi.org/10.5194/egusphere-egu24-20137, 2024.

11:15–12:30

Interactive presentations at PICO screens